ð Milvus v2.4.3ã®ã¡ã¿ããŒã¿ãã£ã«ã¿ãªã³ã°ã®æ°æ©èœ

Milvus v2.4.3ã¯å®å šãªæååã¡ã¿ããŒã¿ãããã³ã°ãå°å ¥ããŸããïŒããã«ãããæ¥é èŸãæ¥å°ŸèŸãæ¥å°ŸèŸããããã¯æåã¯ã€ã«ãã«ãŒãæ€çŽ¢ã䜿çšããŠæååããããã³ã°ã§ããããã«ãªããŸããã
# æ¥é èŸã®äŸã"The "ã§å§ãŸãæååã«ãããããŸãã
expression='title like "The%"'.
# ã€ã³ãã£ã¯ã¹ã®äŸ: æäžã®ã©ããã« "the "ãå«ãæååã«ãããããŸãã
expression='ã%the%ãã®ãããªã¿ã€ãã«'
# Postfixã®äŸ: "Rye "ã§çµããæååã«ãããããŸãã
expression='ã%Ryeãã®ãããªã¿ã€ãã«'
# 1æåã®ã¯ã€ã«ãã«ãŒãã®äŸ.
expression='title like "Flip_ed"'.
以åã®ããã°ã§ã¯ãæ¥é èŸæååãããã³ã°ã«ã€ããŠã®ã¿èª¬æããŸãããããããMilvus v2.4.3以éã§ã¯ãé åå€ã䜿ã£ãå®å šäžèŽããé åå ã®èŠçŽ ãäžèŽãããã©ããã®ãã§ãã¯(contains_any())ãªã©ãããããããªãšãŒã·ã§ã³ãå¯èœã«ãªã£ãŠããŸããðïžð
ãããã®ã¢ããããŒãã«ãããã¡ã¿ããŒã¿ã®ãã£ã«ã¿ãªã³ã°ããã倿©èœã§åŒ·åãªãã®ã«ãªããŸããïŒ
äŸã§èª¬æããŸãããããã®ããã°ã§ã¯ãKaggleããããŠã³ããŒãããIMDBæ ç»ããŒã¿ã䜿ãããšã«ããã
# äžè¬çãªã©ã€ãã©ãªãã€ã³ããŒãããã
ã€ã³ããŒã sys, os, time, pprint
import pandas as pd
# CSVããŒã¿ãèªã¿èŸŒãã
df = pd.read_csv("data/original_data.csv")
# ãã¢çšã«ããŒã¿ãã·ã§ãŒãã«ããããã
df = df.tail(200)
衚瀺(df.head())
åæ ç»ã«ã¯'text'ãã£ãŒã«ããããã説æãšã¬ãã¥ãŒã衚瀺ãããŸããð
åæ ç»ã«ã¯ãå ¬é幎ãã¬ãŒãã£ã³ã°ããžã£ã³ã«ã俳åªãããŒã¯ãŒãã®ãªã¹ããªã©ã®ã¡ã¿ããŒã¿ãå«ãŸããŠããŸããð¬âïžð
å "è¡ "ã¯ãæ ç»ã¬ãã¥ãŒã®ããã¹ããã£ã³ã¯ããã®ãã¯ãã«è¡šçŸãmovie_idãæ ç»ã¿ã€ãã«ããã¹ã¿ãŒãªã³ã¯ããžã£ã³ã«ã俳åªãªã©ã®ã¡ã¿ããŒã¿ã衚ãã
éåžžã®RAGãã¿ãŒã³ã«åŸã: ðð«ã
1.Milvusã«æ¥ç¶ããïŒãŸããMilvusã®ããŒã«ã«å±éã§ããMilvus Liteã«æ¥ç¶ãããããã¯ãã¯ã¿ãŒãä¿åã»ç®¡çããããã®ããŒã¿ããŒã¹ã§ãããð¥ïžð
2.æ ç»ã®ããã¹ãããã¯ãã«ã«å€æããŸãïŒåæ ç»ã®ããã¹ããã£ãŒã«ãïŒèª¬æãšã¬ãã¥ãŒãå«ãïŒãåãåºãããã¯ã¿ãŒã«å€æããŸããããã«ã¯HuggingFaceã¢ãã«** BAAI/bge-large-en-v1.5 ã䜿ããŸããð§ â¡ïžð
3.ãã¯ã¿ãŒãšã¡ã¿ããŒã¿ãMilvusã«æ¿å ¥ããŸãïŒãã®ãã¯ãã«ããå ã®ããã¹ãïŒããã£ã³ã¯ããšåŒã³ãŸãïŒãšãã®ã¡ã¿ããŒã¿ïŒå¹Žãã¬ãŒãã£ã³ã°ããžã£ã³ã«ãªã©ïŒãšãšãã«Milvusã«æ¿å ¥ããŸãã ð¥ð
4.**ãŠãŒã¶ãŒã®ã¯ãšãªãåŠçããïŒãŠãŒã¶ãŒã®ã¯ãšãªãåãåã蟌ã¿ã¢ãã«ã䜿ã£ãŠãã¯ãã«ã«å€æãããæ¬¡ã«ãè¿äŒŒæè¿åæ¢çŽ¢ãå®è¡ããã¯ãšãªã»ãã¯ãã«ã«æãè¿ãããŒã¿ã»ãã¯ãã«ãèŠã€ãããðð¬
å®å šãªã³ãŒãã¯ç§ã®GitHubã«ãããŸãã
ãŸããMilvusã«æ¥ç¶ããŸããPymilvusãpip-installããå¿ èŠããããŸãã(ããŒã«ã«ã®ãã¡ã€ã«åã ããæå®ããããšã§ãããŒã«ã«ã®ãã¯ã¿ãŒããŒã¿ããŒã¹ã§ããMilvus Liteã䜿çšããŸããdockerãK8sããããã€ãããŠããããå®å šã«ç®¡çãããŠããZilliz Cloudãªã©ãä»ã®Milvusãããå Žåã¯ãURIãšTokenãæå®ããŠæ¥ç¶ããããšãã§ããŸããæ®ãã®ã³ãŒãã¯åãããã«åäœããŸã)ã
# !python -m pip install -U pymilvus
import pymilvus
# ã¯ã©ã€ã¢ã³ããMilvus LiteãµãŒãã«æ¥ç¶ããã
from pymilvus import MilvusClient
ã¯ã©ã€ã¢ã³ã = MilvusClient("milvus_demo.db")
次ã«ãæ ç»ã®ã¬ãã¥ãŒãå«ãããã¹ãã«ã©ã ããã£ã³ã¯ããŠãã¯ãã«ã«åã蟌ããå€ãã®ãªãœãŒã¹ããã®æ¹æ³ãäŸç€ºããŠããã®ã§ãããã§æ¹ããŠã³ãŒãã瀺ãããšã¯ããªãã 以äžã§ã¯ããã£ã³ã¯ãããããã¹ãããã¯ã¿ãŒè¡šçŸãã¡ã¿ããŒã¿ãçµã¿ç«ãŠãããŒã¿ãMilvusã«æ¿å ¥ããæ¹æ³ã瀺ãã
# äžã€ã®ã«ãŒãã§chunk_listãšdict_listãäœæããã
dict_list = [].
for id, title, chunk, vector, poster_url, director, \
genres, actors, keywords, film_year, rating in zip(
df.id, df.Name, chunks, converted_values, df.PosterLinkã
df.ç£ç£, df.ãžã£ã³ã«, df.俳åª, df.ããŒã¯ãŒãã
df.MovieYearãdf.RatingValue)ïŒ
# åã蟌ã¿ãã¯ãã«ãå
ã®ããã¹ããã£ã³ã¯ãã¡ã¿ããŒã¿ãçµã¿ç«ãŠãã
chunk_dict = { {'movie_index': id
'movie_index': idã
'title': ã¿ã€ãã«ã
'chunk': chunk.page_contentã
'poster_url': poster_urlã
'director': ç£ç£ã
'genres': ãžã£ã³ã«ã
'actors': 俳åªã
'keywords': ããŒã¯ãŒã
'film_year': ãã£ã«ã 幎ã
'rating': ã¬ãŒãã£ã³ã°
'vector': vectorã
}
dict_list.append(chunk_dict)
# ããŒã¿ãMilvusã³ã¬ã¯ã·ã§ã³ã«æ¿å
¥ããã
print("ãšã³ãã£ãã£ã®æ¿å
¥ãéå§")
start_time = time.time()
client.insert(
ã³ã¬ã¯ã·ã§ã³å
data=dict_listã
progress_bar=True)
end_time = time.time()
print(f "Milvus insert time for {len(dict_list)} vectorsïŒ", end="")
print(f"{np.round(end_time - start_time, 2)}ç§")
ããŒã¿ãMilvusã«å ¥ã£ãã®ã§ãæ€çŽ¢ããæºåãã§ããïŒ
æååã¡ã¿ããŒã¿ãã£ã«ã¿ã«ããæ€çŽ¢
äŸãã°ããããããç»å Žãããã£ã¹ããã¢çãªæªæ¥ãæãããè©äŸ¡ã®é«ãSFæ ç»ãæ€çŽ¢ããããšããŸãã ãµã³ãã«ããŒã¿ã¯ãã®æ€çŽ¢ã«äœ¿ããã¡ã¿ããŒã¿ãæã£ãŠããŸãã
以äžã¯ããã¡ãžãŒãªæååãããã䜿ã£ãã¡ã¿ããŒã¿ã®ãã£ã«ã¿ãªã³ã°ã®äŸã§ãã以äžã§ã¯ãvanilla Milvus search APIã®ã¿ãã©ããããŠãæ€çŽ¢åŸã«ã¡ã¿ããŒã¿ã衚瀺ããããããŠããŸãã
SAMPLE_QUESTION = "ãããããåºãŠãããã£ã¹ããã¢SF"
TOP_K = 1
# ã¡ã¿ããŒã¿ã®ãã£ã«ã¿ãŒ
expression='rating >= 7'
# ã€ã³ãã£ãã¯ã¹æååãããã
expression=expression + ' && title like "%Panther%"'.
formatted_resultsãcontextãcontext_metadata = \
mc_run_search(SAMPLE_QUESTION, expression, TOP_K)
ãªãœãŒã¹ãšåèæç®
Milvus ã¯ã€ãã¯ã¹ã¿ãŒãã¬ã€ã
é åãã£ãŒã«ãã®äœ¿çšïœMilvusããã¥ã¡ã³ã
https://github.com/milvus-io/pymilvus/blob/2.4/examples/fuzzy_match.py
https://milvus.io/docs/boolean.md#Usage
https://milvus.io/docs/single-vector-search.md#Filtered-search
ç¡æã§å§ããŠãç°¡åã«ã¹ã±ãŒã«
ããªãã®GenAIã¢ããªã±ãŒã·ã§ã³ã®ããã«æ§ç¯ããããã«ãããŒãžãã®ãã¯ãã«ããŒã¿ããŒã¹ã詊ããŠã¿ãŠãã ããã
Zilliz Cloudãç¡æã§è©Šãèªã¿ç¶ããŠ

Introducing Business Critical Plan: Enterprise-Grade Security and Compliance for Mission-Critical AI Applications
Discover Zilliz Cloudâs Business Critical Planâoffering advanced security, compliance, and uptime for mission-critical AI and vector database workloads.
Milvus/Zilliz + Surveillance: How Vector Databases Transform Multi-Camera Tracking
See how Milvus vector database enhances multi-camera tracking with similarity-based matching for better surveillance in retail, warehouses and transport hubs.

Building RAG Pipelines for Real-Time Data with Cloudera and Milvus
explore how Cloudera can be integrated with Milvus to effectively implement some of the key functionalities of RAG pipelines.
