ð Milvus v2.4.3ã®ã¡ã¿ããŒã¿ãã£ã«ã¿ãªã³ã°ã®æ°æ©èœ

Milvus v2.4.3ã¯å®å šãªæååã¡ã¿ããŒã¿ãããã³ã°ãå°å ¥ããŸããïŒããã«ãããæ¥é èŸãæ¥å°ŸèŸãæ¥å°ŸèŸããããã¯æåã¯ã€ã«ãã«ãŒãæ€çŽ¢ã䜿çšããŠæååããããã³ã°ã§ããããã«ãªããŸããã
# æ¥é èŸã®äŸã"The "ã§å§ãŸãæååã«ãããããŸãã
expression='title like "The%"'.
# ã€ã³ãã£ã¯ã¹ã®äŸ: æäžã®ã©ããã« "the "ãå«ãæååã«ãããããŸãã
expression='ã%the%ãã®ãããªã¿ã€ãã«'
# Postfixã®äŸ: "Rye "ã§çµããæååã«ãããããŸãã
expression='ã%Ryeãã®ãããªã¿ã€ãã«'
# 1æåã®ã¯ã€ã«ãã«ãŒãã®äŸ.
expression='title like "Flip_ed"'.
以åã®ããã°ã§ã¯ãæ¥é èŸæååãããã³ã°ã«ã€ããŠã®ã¿èª¬æããŸãããããããMilvus v2.4.3以éã§ã¯ãé åå€ã䜿ã£ãå®å šäžèŽããé åå ã®èŠçŽ ãäžèŽãããã©ããã®ãã§ãã¯(contains_any())ãªã©ãããããããªãšãŒã·ã§ã³ãå¯èœã«ãªã£ãŠããŸããðïžð
ãããã®ã¢ããããŒãã«ãããã¡ã¿ããŒã¿ã®ãã£ã«ã¿ãªã³ã°ããã倿©èœã§åŒ·åãªãã®ã«ãªããŸããïŒ
äŸã§èª¬æããŸãããããã®ããã°ã§ã¯ãKaggleããããŠã³ããŒãããIMDBæ ç»ããŒã¿ã䜿ãããšã«ããã
# äžè¬çãªã©ã€ãã©ãªãã€ã³ããŒãããã
ã€ã³ããŒã sys, os, time, pprint
import pandas as pd
# CSVããŒã¿ãèªã¿èŸŒãã
df = pd.read_csv("data/original_data.csv")
# ãã¢çšã«ããŒã¿ãã·ã§ãŒãã«ããããã
df = df.tail(200)
衚瀺(df.head())
åæ ç»ã«ã¯'text'ãã£ãŒã«ããããã説æãšã¬ãã¥ãŒã衚瀺ãããŸããð
åæ ç»ã«ã¯ãå ¬é幎ãã¬ãŒãã£ã³ã°ããžã£ã³ã«ã俳åªãããŒã¯ãŒãã®ãªã¹ããªã©ã®ã¡ã¿ããŒã¿ãå«ãŸããŠããŸããð¬âïžð
å "è¡ "ã¯ãæ ç»ã¬ãã¥ãŒã®ããã¹ããã£ã³ã¯ããã®ãã¯ãã«è¡šçŸãmovie_idãæ ç»ã¿ã€ãã«ããã¹ã¿ãŒãªã³ã¯ããžã£ã³ã«ã俳åªãªã©ã®ã¡ã¿ããŒã¿ã衚ãã
éåžžã®RAGãã¿ãŒã³ã«åŸã: ðð«ã
1.Milvusã«æ¥ç¶ããïŒãŸããMilvusã®ããŒã«ã«å±éã§ããMilvus Liteã«æ¥ç¶ãããããã¯ãã¯ã¿ãŒãä¿åã»ç®¡çããããã®ããŒã¿ããŒã¹ã§ãããð¥ïžð
2.æ ç»ã®ããã¹ãããã¯ãã«ã«å€æããŸãïŒåæ ç»ã®ããã¹ããã£ãŒã«ãïŒèª¬æãšã¬ãã¥ãŒãå«ãïŒãåãåºãããã¯ã¿ãŒã«å€æããŸããããã«ã¯HuggingFaceã¢ãã«** BAAI/bge-large-en-v1.5 ã䜿ããŸããð§ â¡ïžð
3.ãã¯ã¿ãŒãšã¡ã¿ããŒã¿ãMilvusã«æ¿å ¥ããŸãïŒãã®ãã¯ãã«ããå ã®ããã¹ãïŒããã£ã³ã¯ããšåŒã³ãŸãïŒãšãã®ã¡ã¿ããŒã¿ïŒå¹Žãã¬ãŒãã£ã³ã°ããžã£ã³ã«ãªã©ïŒãšãšãã«Milvusã«æ¿å ¥ããŸãã ð¥ð
4.**ãŠãŒã¶ãŒã®ã¯ãšãªãåŠçããïŒãŠãŒã¶ãŒã®ã¯ãšãªãåãåã蟌ã¿ã¢ãã«ã䜿ã£ãŠãã¯ãã«ã«å€æãããæ¬¡ã«ãè¿äŒŒæè¿åæ¢çŽ¢ãå®è¡ããã¯ãšãªã»ãã¯ãã«ã«æãè¿ãããŒã¿ã»ãã¯ãã«ãèŠã€ãããðð¬
å®å šãªã³ãŒãã¯ç§ã®GitHubã«ãããŸãã
ãŸããMilvusã«æ¥ç¶ããŸããPymilvusãpip-installããå¿ èŠããããŸãã(ããŒã«ã«ã®ãã¡ã€ã«åã ããæå®ããããšã§ãããŒã«ã«ã®ãã¯ã¿ãŒããŒã¿ããŒã¹ã§ããMilvus Liteã䜿çšããŸããdockerãK8sããããã€ãããŠããããå®å šã«ç®¡çãããŠããZilliz Cloudãªã©ãä»ã®Milvusãããå Žåã¯ãURIãšTokenãæå®ããŠæ¥ç¶ããããšãã§ããŸããæ®ãã®ã³ãŒãã¯åãããã«åäœããŸã)ã
# !python -m pip install -U pymilvus
import pymilvus
# ã¯ã©ã€ã¢ã³ããMilvus LiteãµãŒãã«æ¥ç¶ããã
from pymilvus import MilvusClient
ã¯ã©ã€ã¢ã³ã = MilvusClient("milvus_demo.db")
次ã«ãæ ç»ã®ã¬ãã¥ãŒãå«ãããã¹ãã«ã©ã ããã£ã³ã¯ããŠãã¯ãã«ã«åã蟌ããå€ãã®ãªãœãŒã¹ããã®æ¹æ³ãäŸç€ºããŠããã®ã§ãããã§æ¹ããŠã³ãŒãã瀺ãããšã¯ããªãã 以äžã§ã¯ããã£ã³ã¯ãããããã¹ãããã¯ã¿ãŒè¡šçŸãã¡ã¿ããŒã¿ãçµã¿ç«ãŠãããŒã¿ãMilvusã«æ¿å ¥ããæ¹æ³ã瀺ãã
# äžã€ã®ã«ãŒãã§chunk_listãšdict_listãäœæããã
dict_list = [].
for id, title, chunk, vector, poster_url, director, \
genres, actors, keywords, film_year, rating in zip(
df.id, df.Name, chunks, converted_values, df.PosterLinkã
df.ç£ç£, df.ãžã£ã³ã«, df.俳åª, df.ããŒã¯ãŒãã
df.MovieYearãdf.RatingValue)ïŒ
# åã蟌ã¿ãã¯ãã«ãå
ã®ããã¹ããã£ã³ã¯ãã¡ã¿ããŒã¿ãçµã¿ç«ãŠãã
chunk_dict = { {'movie_index': id
'movie_index': idã
'title': ã¿ã€ãã«ã
'chunk': chunk.page_contentã
'poster_url': poster_urlã
'director': ç£ç£ã
'genres': ãžã£ã³ã«ã
'actors': 俳åªã
'keywords': ããŒã¯ãŒã
'film_year': ãã£ã«ã 幎ã
'rating': ã¬ãŒãã£ã³ã°
'vector': vectorã
}
dict_list.append(chunk_dict)
# ããŒã¿ãMilvusã³ã¬ã¯ã·ã§ã³ã«æ¿å
¥ããã
print("ãšã³ãã£ãã£ã®æ¿å
¥ãéå§")
start_time = time.time()
client.insert(
ã³ã¬ã¯ã·ã§ã³å
data=dict_listã
progress_bar=True)
end_time = time.time()
print(f "Milvus insert time for {len(dict_list)} vectorsïŒ", end="")
print(f"{np.round(end_time - start_time, 2)}ç§")
ããŒã¿ãMilvusã«å ¥ã£ãã®ã§ãæ€çŽ¢ããæºåãã§ããïŒ
æååã¡ã¿ããŒã¿ãã£ã«ã¿ã«ããæ€çŽ¢
äŸãã°ããããããç»å Žãããã£ã¹ããã¢çãªæªæ¥ãæãããè©äŸ¡ã®é«ãSFæ ç»ãæ€çŽ¢ããããšããŸãã ãµã³ãã«ããŒã¿ã¯ãã®æ€çŽ¢ã«äœ¿ããã¡ã¿ããŒã¿ãæã£ãŠããŸãã
以äžã¯ããã¡ãžãŒãªæååãããã䜿ã£ãã¡ã¿ããŒã¿ã®ãã£ã«ã¿ãªã³ã°ã®äŸã§ãã以äžã§ã¯ãvanilla Milvus search APIã®ã¿ãã©ããããŠãæ€çŽ¢åŸã«ã¡ã¿ããŒã¿ã衚瀺ããããããŠããŸãã
SAMPLE_QUESTION = "ãããããåºãŠãããã£ã¹ããã¢SF"
TOP_K = 1
# ã¡ã¿ããŒã¿ã®ãã£ã«ã¿ãŒ
expression='rating >= 7'
# ã€ã³ãã£ãã¯ã¹æååãããã
expression=expression + ' && title like "%Panther%"'.
formatted_resultsãcontextãcontext_metadata = \
mc_run_search(SAMPLE_QUESTION, expression, TOP_K)
ãªãœãŒã¹ãšåèæç®
Milvus ã¯ã€ãã¯ã¹ã¿ãŒãã¬ã€ã
é åãã£ãŒã«ãã®äœ¿çšïœMilvusããã¥ã¡ã³ã
https://github.com/milvus-io/pymilvus/blob/2.4/examples/fuzzy_match.py
https://milvus.io/docs/boolean.md#Usage
https://milvus.io/docs/single-vector-search.md#Filtered-search
ç¡æã§å§ããŠãç°¡åã«ã¹ã±ãŒã«
ããªãã®GenAIã¢ããªã±ãŒã·ã§ã³ã®ããã«æ§ç¯ããããã«ãããŒãžãã®ãã¯ãã«ããŒã¿ããŒã¹ã詊ããŠã¿ãŠãã ããã
Zilliz Cloudãç¡æã§è©Šãèªã¿ç¶ããŠ

Why Teams Are Migrating from Weaviate to Zilliz Cloud â and How to Do It Seamlessly
Explore how Milvus scales for large datasets and complex queries with advanced features, and discover how to migrate from Weaviate to Zilliz Cloud.

Zilliz Cloud Now Available in Azure North Europe: Bringing AI-Powered Vector Search Closer to European Customers
The addition of the Azure North Europe (Ireland) region further expands our global footprint to better serve our European customers.

Producing Structured Outputs from LLMs with Constrained Sampling
Discuss the role of semantic search in processing unstructured data, how finite state machines enable reliable generation, and practical implementations using modern tools for structured outputs from LLMs.
