Building a RAG Pipeline with Milvus and Haystack 2.0
This guide will demonstrate the integration of Milvus and Haystack 2.0 to build a powerful question-answering application.
Read the entire series
- Effortless AI Workflows: A Beginner's Guide to Hugging Face and PyMilvus
- Building a RAG Pipeline with Milvus and Haystack 2.0
- How to Pick a Vector Index in Your Milvus Instance: A Visual Guide
- Semantic Search with Milvus and OpenAI
- Efficiently Deploying Milvus on GCP Kubernetes: A Guide to Open Source Database Management
- Building RAG with Snowflake Arctic and Transformers on Milvus
- Vectorizing JSON Data with Milvus for Similarity Search
- Building a Multimodal RAG with Gemini 1.5, BGE-M3, Milvus Lite, and LangChain
Introduction
In natural language processing (NLP), Milvus and Haystack have emerged as powerful tools for building advanced applications. As the volume of unstructured data grows exponentially, organizations seek efficient and effective ways to extract valuable insights from their document collections. Traditional search methods often fall short of capturing the semantic meaning and context of the content, leading to suboptimal results and limited usability. This is where the combination of Milvus, a vector database, and Haystack 2.0, an open-source framework for building production-ready LLM applications, comes into play. By leveraging the strengths of these cutting-edge technologies, developers and data scientists can unlock the true potential of their document repositories, enabling faster and more accurate information retrieval, semantic search, and knowledge discovery.
This article will explore how Milvus and Haystack 2.0 can seamlessly integrate and how to utilize this integration to build powerful retrieval augmented generation (RAG) applications. By leveraging Milvus' vector indexing capabilities and Haystack's RAG framework, we will show how to efficiently process, store, and retrieve documents to generate accurate answers to user queries.
What is Milvus?
Milvus is an open-source vector database designed to handle high-dimensional vectors efficiently. Zilliz developers maintain the project. In the context of natural language processing (NLP) and document retrieval, vectors are numerical representations of text or other data points in a high-dimensional space. These vectors capture the semantic meaning and relationships between data points, allowing for similarity-based searches and comparisons.
When it comes to storing and retrieving embedded documents, Milvus offers several key advantages:
1. High-Dimensional Vector Indexing: Milvus is optimized for indexing and searching high-dimensional vectors. It utilizes advanced indexing techniques, such as hierarchical navigable small world graphs (HNSW) or inverted file systems (IVF), to create efficient index structures. These index structures enable fast similarity searches even in large-scale datasets with millions or billions of vectors.
2. Scalability and Performance: Milvus is designed to handle massive amounts of vector data while maintaining high performance. It can scale horizontally across multiple nodes or machines, allowing for distributed storage and parallel processing. This scalability ensures that Milvus can accommodate growing datasets and high query throughput without compromising search speed or accuracy.
3. Multi-vector and Hybrid Search: Users can include up to 10 vector fields in a single collection, each representing different aspects of the data with various embedding models or data types. This enrichment enhances search capabilities, which helps find the most similar item in a vector database based on attributes such as metadata or vector embeddings representing pictures, audio, text, etc. Multi-vector search executes queries across these fields and merges results using reranking strategies like Reciprocal Rank Fusion (RRF) and Weighted Scoring.
4. Integration with Machine Learning models: Milvus integrates with popular machine-learning models from OpenAI, Cohere, and VoyageAI. This integration allows you to easily convert unstructured data into vector embeddings in Milvus for efficient retrieval.
5. Support for Multiple Indexes and Distance Metrics: Milvus supports various indexing algorithms and distance metrics to cater to different use cases and data characteristics. Depending on your specific requirements, you can choose from indexing options like HNSW, IVF, ANNOY, or FAISS. Additionally, Milvus supports distance metrics such as Euclidean distance, cosine similarity, and inner product, allowing you to select the most appropriate metric for your application.
When it comes to storing and retrieving embedded documents, Milvus excels in several aspects:
Efficient Storage: Milvus efficiently stores high-dimensional vectors by utilizing compressed representations and optimized data structures. This reduces storage overhead while maintaining the integrity of the vector data.
Fast Retrieval: With its advanced indexing techniques and query optimization, Milvus enables lightning-fast retrieval of similar documents based on their vector representations. It can quickly search through millions of vectors and return the most relevant documents in milliseconds.
Semantic Similarity: By leveraging the power of vector embeddings, Milvus allows you to retrieve documents based on their semantic similarity. Instead of relying solely on exact keyword matches, Milvus considers the semantic meaning captured by the vectors, enabling more contextual and meaningful search results.
Scalable and Updates: Milvus supports dynamic updates and insertions of new vector embeddings without requiring a complete reindexing of the entire database. This ensures that your collection can grow and evolve while maintaining efficient search performance.
By combining high-dimensional vector indexing, scalability, hybrid search capabilities, and efficient storage and retrieval mechanisms, Milvus provides a powerful solution for storing and retrieving embeddings. It enables applications like semantic search, recommendation systems, and content-based retrieval to deliver accurate and relevant results in real time.
What is Haystack 2.0?
Haystack (by deepset.ai) is an open-source Python framework for building production-ready LLM applications, retrieval-augmented generative pipelines, and search systems that work intelligently over large document collections. Haystack 2.0 is a major rework of the previous version that allows you to implement composable AI systems that are easy to use, customize, extend, optimize, evaluate, and ultimately deploy to production.
At the core of Haystack are its components—fundamental building blocks that can perform tasks like document retrieval, text generation, or summarization. While Haystack offers a bunch of components you can use out of the box, it also lets you create your own custom components — as easy as writing a Python class. You can connect components together to build pipelines, which are the foundation of LLM application architecture in Haystack.
Pipelines: RAG and beyond
Haystack Pipelines are powerful abstractions that allow you to define the data flow through your LLM application. They consist of components. Pipelines are essentially graphs or even multigraphs. A single component with multiple outputs can connect to another with multiple inputs or multiple components, thanks to the flexibility of pipelines.
To get you started, Haystack offers many example pipelines for different use cases: indexing, extractive QA, web search, and more. In this post, we’ll focus on RAG pipelines, which provide contextually relevant documents to an LLM based on a user’s query so it can generate better answers, as well as an index pipeline to ingest the documents.
Haystack 2.0 also provides a range of other features and capabilities that enhance the development of LLM applications.
Integrate data sources for retrieval augmentation from anywhere on the web
Advanced dynamic templates for LLM prompting via the Jinja2 templating language
Cleaning and preprocessing functions for various data formats and sources
Specialized evaluation tools that use different metrics to evaluate the entire system or its individual components
Hayhooks module to serve Haystack Pipelines through HTTP endpoints.
A customizable logging system that supports structured logging and tracing correlation out of the box.
Code instrumentation collecting spans and traces in strategic points of the execution path, with support for Open Telemetry and Datadog already in place.
Integration with Milvus:
One of Haystack 2.0's key advantages is its seamless integration with Milvus, a vector database that indexes and searches high-dimensional vectors. Developers can create document evaluation applications by combining Haystack 2.0's RAG pipeline with Milvus' vector storage and retrieval capabilities. Milvus enables fast and scalable similarity search over large document collections, allowing the retriever component to identify the most relevant documents for a given query. This integration unlocks the potential for building highly efficient and accurate QA systems that handle massive amounts of unstructured textual data.
Setup and Installation
To get started, ensure that you have Python installed (version 3.6 or higher). If you do not, you can download it from https://www.python.org. Once you have Python installed, you can install all the packages you’ll need into your Python environment using the following commands:
pip install --upgrade pymilvus milvus-haystack markdown-it-py mdit_plain pypdf sentence-transformers
Note: If the "pip" command does not work, either your command prompt is not in the environment where you installed Python, or you need to restart your computer so your system "knows" where to find Python based on the system's environment variables.
Important: Things can change quickly in the LLM world. Refer to Haystack’s tutorials for the up-to-date code samples.
Back on track: Document storage
A document store plays a crucial role in NLP applications, serving as a centralized repository for storing and managing documents. This guide will use Milvus as our document store, leveraging its high-performance vector similarity search capabilities. Let's get started.
Building the Indexing Pipeline
The indexing pipeline is responsible for processing and storing documents in the MilvusDocumentStore. Here's a step-by-step guide to setting up the indexing pipeline.
Before you start coding, add some sample files to a subfolder of your project called “recipe_files.” In this example, we used some recipes from this folder. You can use any files you want if you modify the file paths in the code to contain the correct filenames.
1. Initialize the MilvusDocumentStore:
from milvus_haystack import MilvusDocumentStore
document_store = MilvusDocumentStore(
connection_args={
"uri": "http://localhost:19530", #Your Milvus service uri
},
drop_old=True,
)
Please note: For the above code and all code that follows, we highly recommend adding error checking, handling, and logging. In practice, document indexing and retrieval operations could fail for various reasons, such as network issues, configuration errors, or data format problems. Adding error-handling routines can make the script more robust and easier to troubleshoot.
2. Configure the necessary components for document processing:
from haystack.components.writers import DocumentWriter
from haystack.components.converters import MarkdownToDocument, PyPDFToDocument, TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter, DocumentCleaner
from haystack.components.routers import FileTypeRouter
from haystack.components.joiners import DocumentJoiner
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack import Pipeline
# Initialize the components you’ll need in your indexing pipeline
file_type_router = FileTypeRouter(mime_types=["text/plain", "application/pdf", "text/markdown"])
text_file_converter = TextFileToDocument()
markdown_converter = MarkdownToDocument()
pdf_converter = PyPDFToDocument()
document_joiner = DocumentJoiner()
document_cleaner = DocumentCleaner()
document_splitter = DocumentSplitter(split_by="word", split_length=150, split_overlap=50)
document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
document_writer = DocumentWriter(document_store)
3. Index the documents:
Add the components to the pipeline. Connect them to specify each component's inputs and outputs and the order in which they’ll be run.
preprocessing_pipeline = Pipeline()
preprocessing_pipeline.add_component(instance=file_type_router, name="file_type_router")
preprocessing_pipeline.add_component(instance=text_file_converter, name="text_file_converter")
preprocessing_pipeline.add_component(instance=markdown_converter, name="markdown_converter")
preprocessing_pipeline.add_component(instance=pdf_converter, name="pypdf_converter")
preprocessing_pipeline.add_component(instance=document_joiner, name="document_joiner")
preprocessing_pipeline.add_component(instance=document_cleaner, name="document_cleaner")
preprocessing_pipeline.add_component(instance=document_splitter, name="document_splitter")
preprocessing_pipeline.add_component(instance=document_embedder, name="document_embedder")
preprocessing_pipeline.add_component(instance=document_writer, name="document_writer")
preprocessing_pipeline.connect("file_type_router.text/plain", "text_file_converter.sources")
preprocessing_pipeline.connect("file_type_router.application/pdf", "pypdf_converter.sources")
preprocessing_pipeline.connect("file_type_router.text/markdown", "markdown_converter.sources")
preprocessing_pipeline.connect("text_file_converter", "document_joiner")
preprocessing_pipeline.connect("pypdf_converter", "document_joiner")
preprocessing_pipeline.connect("markdown_converter", "document_joiner")
preprocessing_pipeline.connect("document_joiner", "document_cleaner")
preprocessing_pipeline.connect("document_cleaner", "document_splitter")
preprocessing_pipeline.connect("document_splitter", "document_embedder")
preprocessing_pipeline.connect("document_embedder", "document_writer")
Run the pipeline
from pathlib import Path
output_dir = "recipe_files"
preprocessing_pipeline.run({"file_type_router": {"sources": list(Path(output_dir).glob("**/*"))}})
print(document_store.count_documents())
During indexing, the documents are processed, converted to text, split into chunks, and embedded into high-dimensional vectors. These embedding vectors are crucial for efficient retrieval based on semantic similarity.
Integrating the RAG Pipeline
The RAG (Retrieval-Augmented Generation) pipeline combines document retrieval with answer generation using a large language model (LLM). For this example, you’ll need an OpenAI key. Set it as an environment variable, OPENAI_API_KEY
. For the full list of models Haystack supports, see their documentation.
Here's how to set up the RAG pipeline:
from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from milvus_haystack.milvus_embedding_retriever import MilvusEmbeddingRetriever
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
template = """
Answer the questions based on the given context.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ question }}
Answer:
"""
rag_pipeline = Pipeline()
rag_pipeline.add_component("embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
rag_pipeline.add_component("retriever", MilvusEmbeddingRetriever(document_store=document_store))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=template))
rag_pipeline.add_component("llm", OpenAIGenerator(api_key=Secret.from_token(os.getenv("OPENAI_API_KEY")),
generation_kwargs={"temperature": 0}))
rag_pipeline.connect("embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
question = (
"What ingredients would I need to make vegan keto eggplant lasagna, vegan persimmon flan, and vegan hemp cheese?"
)
response = rag_pipeline.run(
{
"embedder": {"text": question},
"prompt_builder": {"question": question},
}
)
print(response)
{'llm': {'replies': ['To make vegan keto eggplant lasagna, you would need ingredients such as eggplants, basil, almonds, nutritional yeast, olive oil, tofu, spinach, lemon, garlic powder, macadamia nuts, agar agar, and vegan mozzarella.\n\nTo make vegan persimmon flan, you would need ingredients such as persimmon pulp, cornstarch, agar agar, agave nectar, granulated sugar, coconut creme, almond milk, and vanilla.\n\nTo make vegan hemp cheese, you would need ingredients such as sunflower seeds, hemp hearts, miso paste, nutritional yeast, rejuvelac, and salt.'], 'meta': [{'model': 'gpt-3.5-turbo-0125', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 130, 'prompt_tokens': 3016, 'total_tokens': 3146}}]}}
Also, as mentioned earlier, the AI landscape changes rapidly, so we recommend treating the code above and throughout this document as examples. Please refer to the latest documentation and examples on Haystack's extensive resource pages. Note the resources listed at the end of this article.
The RAG pipeline retrieves relevant documents based on the query, generates a prompt for the OpenAIGenerator using the retrieved documents, and generates an answer using the LLM. The generated answer is then returned as the final output.
Conclusion
Integrating Milvus and Haystack 2.0 provides a powerful framework for building efficient and accurate question-answering applications. By leveraging Milvus' vector indexing capabilities and Haystack's retrieval-augmented generation pipeline, you can create systems that effectively process, store, and evaluate documents to generate relevant answers to user queries. You can begin building advanced NLP applications using these technologies with the steps outlined in this guide.
Resources
Milvus documentation: https://milvus.io/docs
Milvus community (GitHub, Discord, Reddit, Twitter): https://milvus.io/community
Haystack documentation: https://docs.haystack.deepset.ai/docs/intro
Haystack tutorials: https://haystack.deepset.ai/tutorials
Haystack API reference: https://docs.haystack.deepset.ai/reference
Haystack Discord: https://discord.com/invite/VBpFzsgRVF
- Introduction
- What is Milvus?
- What is Haystack 2.0?
- Setup and Installation
- Building the Indexing Pipeline
- Integrating the RAG Pipeline
- Conclusion
- Resources
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading

How to Pick a Vector Index in Your Milvus Instance: A Visual Guide
In this post, we'll explore several vector indexing strategies that can be used to efficiently perform similarity search, even in scenarios where we have large amounts of data and multiple constraints to consider.

Semantic Search with Milvus and OpenAI
In this guide, we’ll explore semantic search capabilities through the integration of Milvus and OpenAI’s Embedding API, using a book title search as an example use case.

Efficiently Deploying Milvus on GCP Kubernetes: A Guide to Open Source Database Management
Self-hosting Milvus on Kubernetes (K8s), especially in the Google Cloud Platform (GCP) environment, offers numerous benefits. Read about the benefits and how to set up the Kubernetes cluster on GCP in the blog.