š Milvus v2.4.3ć®ć”ćæćć¼ćæćć£ć«ćæćŖć³ć°ć®ę°ę©č½
Milvus v2.4.3ćÆå®å ØćŖęååć”ćæćć¼ćæćććć³ć°ćå°å „ćć¾ććļ¼ććć«ćććę„é č¾ćę„å°¾č¾ćę„å°¾č¾ćććććÆęåćÆć¤ć«ćć«ć¼ćę¤ē“¢ć使ēØćć¦ęååććććć³ć°ć§ććććć«ćŖćć¾ććć
# ę„é č¾ć®ä¾ć"The "ć§å§ć¾ćęååć«ććććć¾ćć
expression='title like "The%"'.
# ć¤ć³ćć£ćÆć¹ć®ä¾: ęäøć®ć©ććć« "the "ćå«ćęååć«ććććć¾ćć
expression='ć%the%ćć®ćććŖćæć¤ćć«'
# Postfixć®ä¾: "Rye "ć§ēµććęååć«ććććć¾ćć
expression='ć%Ryećć®ćććŖćæć¤ćć«'
# 1ęåć®ćÆć¤ć«ćć«ć¼ćć®ä¾.
expression='title like "Flip_ed"'.
仄åć®ććć°ć§ćÆćę„é č¾ęååćććć³ć°ć«ć¤ćć¦ć®ćæčŖ¬ęćć¾ćććććććMilvus v2.4.3仄éć§ćÆćé åå¤ć使ć£ćå®å Øäøč“ććé åå ć®č¦ē“ ćäøč“ćććć©ććć®ćć§ććÆ(contains_any())ćŖć©ćććććććŖćØć¼ć·ć§ć³ćåÆč½ć«ćŖć£ć¦ćć¾ććšļøš
ćććć®ć¢ćććć¼ćć«ćććć”ćæćć¼ćæć®ćć£ć«ćæćŖć³ć°ćććå¤ę©č½ć§å¼·åćŖćć®ć«ćŖćć¾ććļ¼
ä¾ć§čŖ¬ęćć¾ćććććć®ććć°ć§ćÆćKagglećććć¦ć³ćć¼ćććIMDBę ē»ćć¼ćæć使ćććØć«ććć
# äøč¬ēćŖć©ć¤ćć©ćŖćć¤ć³ćć¼ćććć
ć¤ć³ćć¼ć sys, os, time, pprint
import pandas as pd
# CSVćć¼ćæćčŖćæč¾¼ćć
df = pd.read_csv("data/original_data.csv")
# ćć¢ēØć«ćć¼ćæćć·ć§ć¼ćć«ććććć
df = df.tail(200)
蔨示(df.head())
åę ē»ć«ćÆ'text'ćć£ć¼ć«ćććććčŖ¬ęćØć¬ćć„ć¼ć蔨示ććć¾ććš
åę ē»ć«ćÆćå ¬é幓ćć¬ć¼ćć£ć³ć°ććøć£ć³ć«ćäæ³åŖććć¼ćÆć¼ćć®ćŖć¹ććŖć©ć®ć”ćæćć¼ćæćå«ć¾ćć¦ćć¾ć暬āļøš
å "č” "ćÆćę ē»ć¬ćć„ć¼ć®ććć¹ććć£ć³ćÆććć®ććÆćć«č”Øē¾ćmovie_idćę ē»ćæć¤ćć«ććć¹ćæć¼ćŖć³ćÆććøć£ć³ć«ćäæ³åŖćŖć©ć®ć”ćæćć¼ćæć蔨ćć
éåøøć®RAGććæć¼ć³ć«å¾ć: šš«ć
1.Milvusć«ę„ē¶ććļ¼ć¾ććMilvusć®ćć¼ć«ć«å±éć§ććMilvus Liteć«ę„ē¶ććććććÆććÆćæć¼ćäæåć»ē®”ēććććć®ćć¼ćæćć¼ć¹ć§ćććš„ļøš
2.ę ē»ć®ććć¹ććććÆćć«ć«å¤ęćć¾ćļ¼åę ē»ć®ććć¹ććć£ć¼ć«ćļ¼čŖ¬ęćØć¬ćć„ć¼ćå«ćļ¼ćåćåŗććććÆćæć¼ć«å¤ęćć¾ććććć«ćÆHuggingFaceć¢ćć«** BAAI/bge-large-en-v1.5 ć使ćć¾ććš§ ā”ļøš
3.ććÆćæć¼ćØć”ćæćć¼ćæćMilvusć«ęæå „ćć¾ćļ¼ćć®ććÆćć«ććå ć®ććć¹ćļ¼ććć£ć³ćÆććØå¼ć³ć¾ćļ¼ćØćć®ć”ćæćć¼ćæļ¼å¹“ćć¬ć¼ćć£ć³ć°ććøć£ć³ć«ćŖć©ļ¼ćØćØćć«Milvusć«ęæå „ćć¾ćć š„š
4.**ć¦ć¼ć¶ć¼ć®ćÆćØćŖćå¦ēććļ¼ć¦ć¼ć¶ć¼ć®ćÆćØćŖćåćåćč¾¼ćæć¢ćć«ć使ć£ć¦ććÆćć«ć«å¤ęćććꬔć«ćčæä¼¼ęčæåę¢ē“¢ćå®č”ćććÆćØćŖć»ććÆćć«ć«ęćčæććć¼ćæć»ććÆćć«ćč¦ć¤ćććšš¬
å®å ØćŖć³ć¼ććÆē§ć®GitHubć«ććć¾ćć
ć¾ććMilvusć«ę„ē¶ćć¾ććPymilvusćpip-installććåæ č¦ćććć¾ćć(ćć¼ć«ć«ć®ćć”ć¤ć«åć ććęå®ććććØć§ććć¼ć«ć«ć®ććÆćæć¼ćć¼ćæćć¼ć¹ć§ććMilvus Liteć使ēØćć¾ććdockerćK8sććććć¤ććć¦ććććå®å Øć«ē®”ēććć¦ććZilliz CloudćŖć©ćä»ć®Milvusćććå “åćÆćURIćØTokenćęå®ćć¦ę„ē¶ććććØćć§ćć¾ććę®ćć®ć³ć¼ććÆåćććć«åä½ćć¾ć)ć
# !python -m pip install -U pymilvus
import pymilvus
# ćÆć©ć¤ć¢ć³ććMilvus Litećµć¼ćć«ę„ē¶ććć
from pymilvus import MilvusClient
ćÆć©ć¤ć¢ć³ć = MilvusClient("milvus_demo.db")
ꬔć«ćę ē»ć®ć¬ćć„ć¼ćå«ćććć¹ćć«ć©ć ććć£ć³ćÆćć¦ććÆćć«ć«åćč¾¼ććå¤ćć®ćŖć½ć¼ć¹ććć®ę¹ę³ćä¾ē¤ŗćć¦ććć®ć§ćććć§ę¹ćć¦ć³ć¼ćć示ćććØćÆććŖćć 仄äøć§ćÆććć£ć³ćÆćććććć¹ććććÆćæć¼č”Øē¾ćć”ćæćć¼ćæćēµćæē«ć¦ććć¼ćæćMilvusć«ęæå „ććę¹ę³ć示ćć
# äøć¤ć®ć«ć¼ćć§chunk_listćØdict_listćä½ęććć
dict_list = [].
for id, title, chunk, vector, poster_url, director, \
genres, actors, keywords, film_year, rating in zip(
df.id, df.Name, chunks, converted_values, df.PosterLinkć
df.ē£ē£, df.ćøć£ć³ć«, df.äæ³åŖ, df.ćć¼ćÆć¼ćć
df.MovieYearćdf.RatingValue)ļ¼
# åćč¾¼ćæććÆćć«ćå
ć®ććć¹ććć£ć³ćÆćć”ćæćć¼ćæćēµćæē«ć¦ćć
chunk_dict = { {'movie_index': id
'movie_index': idć
'title': ćæć¤ćć«ć
'chunk': chunk.page_contentć
'poster_url': poster_urlć
'director': ē£ē£ć
'genres': ćøć£ć³ć«ć
'actors': äæ³åŖć
'keywords': ćć¼ćÆć¼ć
'film_year': ćć£ć«ć 幓ć
'rating': ć¬ć¼ćć£ć³ć°
'vector': vectorć
}
dict_list.append(chunk_dict)
# ćć¼ćæćMilvusć³ć¬ćÆć·ć§ć³ć«ęæå
„ććć
print("ćØć³ćć£ćć£ć®ęæå
„ćéå§")
start_time = time.time()
client.insert(
ć³ć¬ćÆć·ć§ć³å
data=dict_listć
progress_bar=True)
end_time = time.time()
print(f "Milvus insert time for {len(dict_list)} vectorsļ¼", end="")
print(f"{np.round(end_time - start_time, 2)}ē§")
ćć¼ćæćMilvusć«å „ć£ćć®ć§ćę¤ē“¢ććęŗåćć§ććļ¼
ęååć”ćæćć¼ćæćć£ć«ćæć«ććę¤ē“¢
ä¾ćć°ććććććē»å “ćććć£ć¹ććć¢ēćŖęŖę„ćęćććč©ä¾”ć®é«ćSFę ē»ćę¤ē“¢ććććØćć¾ćć ćµć³ćć«ćć¼ćæćÆćć®ę¤ē“¢ć«ä½æććć”ćæćć¼ćæćęć£ć¦ćć¾ćć
仄äøćÆććć”ćøć¼ćŖęååćććć使ć£ćć”ćæćć¼ćæć®ćć£ć«ćæćŖć³ć°ć®ä¾ć§ćć仄äøć§ćÆćvanilla Milvus search APIć®ćæćć©ćććć¦ćę¤ē“¢å¾ć«ć”ćæćć¼ćæć蔨示ćććććć¦ćć¾ćć
SAMPLE_QUESTION = "ćććććåŗć¦ćććć£ć¹ććć¢SF"
TOP_K = 1
# ć”ćæćć¼ćæć®ćć£ć«ćæć¼
expression='rating >= 7'
# ć¤ć³ćć£ććÆć¹ęååćććć
expression=expression + ' && title like "%Panther%"'.
formatted_resultsćcontextćcontext_metadata = \
mc_run_search(SAMPLE_QUESTION, expression, TOP_K)
ćŖć½ć¼ć¹ćØåčęē®
Milvus ćÆć¤ććÆć¹ćæć¼ćć¬ć¤ć
é åćć£ć¼ć«ćć®ä½æēØļ½Milvusććć„ć”ć³ć
https://github.com/milvus-io/pymilvus/blob/2.4/examples/fuzzy_match.py
https://milvus.io/docs/boolean.md#Usage
https://milvus.io/docs/single-vector-search.md#Filtered-search
čŖćæē¶ćć¦

What Is a Vector Lakebase?
A Vector Lakebase is a unified, lake-native data architecture for AI that combines vector-database-grade serving with open lake storage, reusable lake-level indexes, and a shared semantic layer.

Notion's Vector Search Is Excellent. Their Next Problem Is Harder.
Notion solved vector search scaling in two years. The next bottleneck ā offline context engineering, unified data, and the real-time/offline gap ā is harder.

Zilliz Cloud BYOC Now Available Across AWS, GCP, and Azure
Zilliz Cloud BYOC is now generally available on all three major clouds. Deploy fully managed vector search in your own AWS, GCP, or Azure account ā your data never leaves your VPC.



