DSPy and Zilliz Cloud Integration
DSPy and Zilliz Cloud integrate to build high-performance Retrieval-Augmented Generation pipelines, combining DSPy's programmatic framework for optimizing LLM prompts and weights with Zilliz Cloud's efficient vector search capabilities for scalable, automatically optimized RAG applications.
Use this integration for FreeWhat is DSPy
DSPy, introduced by the Stanford NLP Group, is a groundbreaking programmatic framework designed to optimize prompts and weights within language models, particularly valuable in scenarios where large language models (LLMs) are integrated across multiple stages of a pipeline. Unlike conventional prompting engineering techniques reliant on manual crafting and tweaking, DSPy adopts a learning-based approach. By assimilating query-answer examples, DSPy generates optimized prompts dynamically, tailored to specific tasks. This innovative methodology enables the seamless reassembly of entire pipelines, eliminating the need for continuous manual prompt adjustments. DSPy's Pythonic syntax offers various composable and declarative modules, simplifying the instruction of LLMs.
By integrating with Zilliz Cloud (fully managed Milvus), DSPy gains access to a fully managed vector database through the MilvusRM retriever module, enabling developers to easily define and optimize RAG programs while taking advantage of Milvus' strong vector search capabilities for efficient and scalable retrieval.
Benefits of the DSPy + Zilliz Cloud Integration
- Simplified RAG implementation and configuration: The integration streamlines RAG pipeline setup by programmatically automating the optimization of vector retrieval, prompt design, and LLM fine-tuning, reducing manual adjustment requirements.
- Improved RAG performance and scalability: Zilliz Cloud delivers high-performance managed Milvus, ensuring efficient handling of large-scale data retrieval operations and making applications more robust and capable of managing extensive datasets.
- Programmatic prompt optimization: DSPy's learning-based approach dynamically generates optimized prompts tailored to specific tasks, eliminating the trial-and-error method of traditional prompt templates and significantly improving answer quality.
- Modularized abstraction: DSPy effectively abstracts intricate aspects of LM pipeline development such as decomposition, fine-tuning, and model selection, while Zilliz Cloud handles the vector storage and retrieval layer seamlessly.
How the Integration Works
DSPy provides a programmatic framework for building LLM pipelines using composable and declarative modules. It offers Signatures for defining input/output behavior, Modules for abstracting prompting techniques like chain of thought or ReAct, and Optimizers for fine-tuning parameters such as prompts and LLM weights to maximize specified metrics like accuracy.
Zilliz Cloud serves as the vector database layer through the MilvusRM retriever module in DSPy, storing and indexing document embeddings for fast similarity search. It enables the RAG pipeline to retrieve the most relevant context from large knowledge bases efficiently.
Together, DSPy and Zilliz Cloud create an automatically optimized RAG solution: data is ingested and embedded into Zilliz Cloud, DSPy's MilvusRM retrieves relevant context through vector similarity search, and the framework's optimizers iteratively refine prompts and parameters to enhance answer quality — all with minimal manual intervention.
Step-by-Step Guide
1. Install Dependencies
Install DSPy with Milvus support and PyMilvus:
$ pip install "dspy-ai[milvus]" $ pip install -U pymilvus2. Load the Dataset
Use the HotPotQA dataset, a collection of complex question-answer pairs, as our training dataset:
from dspy.datasets import HotPotQA # Load the dataset. dataset = HotPotQA( train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0 ) # Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata. trainset = [x.with_inputs("question") for x in dataset.train] devset = [x.with_inputs("question") for x in dataset.dev]3. Ingest Data into the Milvus Vector Database
Ingest the context information into the Milvus collection for vector retrieval. This collection should have an
embeddingfield and atextfield. We use OpenAI'stext-embedding-3-smallmodel as the default query embedding function in this case.import requests import os os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>" MILVUS_URI = "example.db" MILVUS_TOKEN = "" from pymilvus import MilvusClient, DataType, Collection from dspy.retrieve.milvus_rm import openai_embedding_function client = MilvusClient(uri=MILVUS_URI, token=MILVUS_TOKEN) if "dspy_example" not in client.list_collections(): client.create_collection( collection_name="dspy_example", overwrite=True, dimension=1536, primary_field_name="id", vector_field_name="embedding", id_type="int", metric_type="IP", max_length=65535, enable_dynamic=True, ) text = requests.get( "https://raw.githubusercontent.com/wxywb/dspy_dataset_sample/master/sample_data.txt" ).text for idx, passage in enumerate(text.split("\n")): if len(passage) == 0: continue client.insert( collection_name="dspy_example", data=[ { "id": idx, "embedding": openai_embedding_function(passage)[0], "text": passage, } ], )4. Define MilvusRM and Configure DSPy
Define the MilvusRM retriever module and configure DSPy with the language model:
from dspy.retrieve.milvus_rm import MilvusRM import dspy retriever_model = MilvusRM( collection_name="dspy_example", uri=MILVUS_URI, token=MILVUS_TOKEN, # ignore this if no token is required for Milvus connection embedding_function=openai_embedding_function, ) turbo = dspy.OpenAI(model="gpt-3.5-turbo") dspy.settings.configure(lm=turbo)5. Build Signatures and the RAG Pipeline
Define the signature for the task and build the RAG pipeline module:
class GenerateAnswer(dspy.Signature): """Answer questions with short factoid answers.""" context = dspy.InputField(desc="may contain relevant facts") question = dspy.InputField() answer = dspy.OutputField(desc="often between 1 and 5 words") class RAG(dspy.Module): def __init__(self, rm): super().__init__() self.retrieve = rm self.generate_answer = dspy.ChainOfThought(GenerateAnswer) def forward(self, question): context = self.retrieve(question).passages prediction = self.generate_answer(context=context, question=question) return dspy.Prediction( context=[item.long_text for item in context], answer=prediction.answer )6. Execute the Pipeline and Get Results
Try out the RAG pipeline and evaluate the results:
rag = RAG(retriever_model) print(rag("who write At My Window").answer)Evaluate the quantitative results on the dataset:
from dspy.evaluate.evaluate import Evaluate evaluate_on_hotpotqa = Evaluate( devset=devset, num_threads=1, display_progress=False, display_table=5 ) metric = dspy.evaluate.answer_exact_match score = evaluate_on_hotpotqa(rag, metric=metric) print("rag:", score)7. Optimize the Pipeline
Compile the program to update parameters within each module and enhance performance:
from dspy.teleprompt import BootstrapFewShot def validate_context_and_answer(example, pred, trace=None): answer_EM = dspy.evaluate.answer_exact_match(example, pred) answer_PM = dspy.evaluate.answer_passage_match(example, pred) return answer_EM and answer_PM teleprompter = BootstrapFewShot(metric=validate_context_and_answer) # Compile! compiled_rag = teleprompter.compile(rag, trainset=trainset) # Evaluate the compiled RAG program. score = evaluate_on_hotpotqa(compiled_rag, metric=metric) print("compile_rag:", score)The Ragas score has increased from its previous value of 50.0 to 52.0, indicating an enhancement in answer quality.
Learn More
- Integrate Milvus with DSPy — Official Milvus tutorial for integrating with DSPy
- Exploring DSPy and Its Integration with Milvus for Crafting Highly Efficient RAG Pipelines — Zilliz blog on DSPy and Milvus integration
- Building RAG with Llama3, Ollama, DSPy, and Milvus — Zilliz tutorial on building RAG with DSPy
- DSPy GitHub Repository — Official DSPy source code
- Milvus GitHub Repository — Official Milvus source code