Building an AI Agent for RAG with Milvus and LlamaIndex

In 2023, there was a massive explosion in the popularity of large language models (LLMs), and LLM applications as a result. The two most popular types of LLM applications that have come to the forefront are retrieval augmented generation (RAG) and AI agents. RAG is about using a vector database like Milvus to inject contextual data. AI Agents are about using LLMs to use other tools. You can find the notebook on GitHub.
OpenAI doesn’t need an introduction. To access OpenAI services, it is necessary to add the openai api key, especially for functionalities like embeddings and using ChatGPT.
In this article, we’re going to combine the two. We cover:
The Tech Stack: Milvus and LlamaIndex
Milvus Lite
Milvus on Docker Compose
LlamaIndex
Building an AI Agent for RAG
Spinning up Milvus
Loading the Data into Milvus via LlamaIndex
Creating the Query Engine Tools for the AI Agent
Creating the AI Agent for RAG
Summary of Building an AI Agent for RAG with Milvus and LlamaIndex
Introduction to Milvus and LlamaIndex for RAG
Milvus and LlamaIndex are two powerful tools that can be used together to build a Retrieval-Augmented Generation (RAG) system. Milvus is a vector database that allows for efficient storage and querying of large amounts of data, while LlamaIndex is a library that provides a simple and flexible way to interact with Milvus and other vector databases. By combining these two tools, developers can build RAG systems that can efficiently retrieve and generate text based on a given prompt.
Milvus excels in handling large-scale vector data, making it ideal for applications that require high-performance similarity search. On the other hand, LlamaIndex simplifies the process of indexing and querying this data, providing a seamless interface for developers. Together, they form a robust foundation for RAG systems, enabling the retrieval of relevant information and the generation of contextually accurate responses.
The Tech Stack: Milvus Vector Store and LlamaIndex
For this AI Agent that does RAG, we actually use three technologies: Milvus, LlamaIndex, and OpenAI. OpenAI doesn’t need an introduction. You can also use OctoAI or a HuggingFace LLM as a drop-in. We may update this with a version using either of those later.
There are many ways we can use Milvus. Milvus Lite, which we can spin up directly in our Jupyter notebook, and Milvus through Docker. Milvus lite can be installed through PyPi using pip install milvus. The milvus vector store allows for loading data and enabling search capabilities based on query vectors. Then, it can be spun up, used, and spun down directly in your notebook.
Milvus is a distributed system, so naturally, it makes sense to spin up Milvus with Docker Compose. The docker compose file is available on the page and the Milvus GitHub. When you spin up Milvus with Docker Compose, you will see three containers and connect to Milvus through port 19530 by default.
LlamaIndex
LlamaIndex is one of the most popular frameworks for building LLM apps. It is one of three popular projects - LlamaIndex, LangChain, and Haystack. LlamaIndex's main focus is to provide a framework specialized for retrieval tasks. In addition, it provides tools for building AI Agents.
There are many ways to build RAG and AI Agents. This example builds on my original tutorial on a RAG AI Agent. The primary difference between these two examples is that this one uses Milvus as a persistent vector store with LlamaIndex. The LlamaIndex imports remain the same - the three imports we get from the core piece of LlamaIndex are the directory reader, the vector store index, and the storage context.
We also grab the query engineer tool and the tool metadata object to create and describe our tool for RAG. Compared to the last version, the additional import we make here is to get the MilvusVectorStore object for LlamaIndex. To get all of these, you need to run pip install -U llama-index llama-index-vector-stores-milvus pymilvus llama-index-llms-openai llama-index-readers-file
.
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
StorageContext
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.vector_stores.milvus import MilvusVectorStore
A critical step in doing RAG with a vector database backend is to spinning up our vector database. Milvus is the only distributed vector database on the market, and it spins up in two ways. First, we can directly import the default_server
and start()
it. Second, we can spin it up via Docker Compose. The code for both is shown below.
from milvus import default_server
default_server.start()
Or run docker compose up -d
in the terminal in the same directory that your docker-compose.yml
file is.
The only difference between our Milvus instances that hold the data is the collection name. It is important to set the consistency level to ensure data integrity during operations, with the default setting being 'Strong'.
Loading the Document Data into Milvus via LlamaIndex
The first step to building a RAG app is loading your data into your vector database. For this tutorial, we do this via LlamaIndex. We use their simple directory reader to read the input files. In this case, we are looking at the financial documents for Lyft and Uber. The process of collection creation involves setting up a database where data is structured and indexed, ensuring effective data management and retrieval.
# load data
lyft_docs = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
input_files=["./data/10k/uber_2021.pdf"]
).load_data()
Once the data is loaded, we need to create a MilvusVectorStore
object in LlamaIndex to hold it. We use OpenAI for both the embedding model and the LLM in this example, so we pass a dimension of 1536. The only difference between our Milvus instances that hold the data is the collection name. It is also possible to overwrite an existing collection with the same name if necessary.
If you use Milvus Lite instead of Milvus through Docker Compose, you should also pass the host and port to ensure that you connect to the right port. We must get the port from the default server through the listen_port
option.
# build index
vector_store_lyft = MilvusVectorStore(dim=1536, collection_name="lyft", overwrite=True)
vector_store_uber = MilvusVectorStore(dim=1536, collection_name="uber", overwrite=True)
# for milvus lite users
if milvuslite:
vector_store_lyft = MilvusVectorStore(host="localhost", port=default_server.listen_port, dim=1536, collection_name="lyft", overwrite=True)
vector_store_uber = MilvusVectorStore(host="localhost", port=default_server.listen_port, dim=1536, collection_name="uber", overwrite=True)
We have to be able to pass the data around, not just load it. LlamaIndex handles this abstraction by providing a StorageContext
object. We pass the vector store into the storage context for both Lyft and Uber to be able to pass that data around. Then, we create indexes on those vector stores. The last two lines in this block persist those vector stores on your local disk so you can retrieve them from storage next time.
storage_context_lyft = StorageContext.from_defaults(vector_store=vector_store_lyft)
storage_context_uber = StorageContext.from_defaults(vector_store=vector_store_uber)
lyft_index = VectorStoreIndex.from_documents(lyft_docs, storage_context=storage_context_lyft)
uber_index = VectorStoreIndex.from_documents(uber_docs, storage_context=storage_context_uber)
# persist index
lyft_index.storage_context.persist(persist_dir="./storage/lyft")
uber_index.storage_context.persist(persist_dir="./storage/uber")
Creating the Query Engine Tools for the AI Agent
For an AI Agent to do RAG, it needs to be able to use tools to perform querying on vector databases. In this next block, we turn the vector indexes we created into query engines that retrieve the top 3 most similar results.
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
We then turn the query engines into tools. The ReAct agent for LlamaIndex (imported in the next section) takes a list of tools as part of its input. So, in this section, we create that list of tools. We only need to create two tools, both with nearly identical structures. The Query Engine Tool requires a query engine and metadata. The metadata is used to name the tool and tell the LLM what it does and how to use it.
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(
name="lyft_10k",
description=(
"Provides information about Lyft financials for year 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(
name="uber_10k",
description=(
"Provides information about Uber financials for year 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
]
Everything is prepared for the last piece - putting the pieces of the agent together. Here, we import the ReAct Agent tool and OpenAI from LlamaIndex. We define the LLM as GPT-3.5, although you can actually use any LLM you want. For the agent, we pass it our list of tools from earlier, the LLM, and in this particular example, “verbose” to see its “thoughts”.
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo-0613")
agent = ReActAgent.from_tools(
query_engine_tools,
llm=llm,
verbose=True
)
response = agent.chat("What was Lyft's revenue growth in 2021?")
print(str(response))
When asked what Lyft’s revenue growth in 2021 was, we expect a response like the one below.
Querying and Retrieving Data with Milvus and LlamaIndex
To query and retrieve data with Milvus and LlamaIndex, you need to create a Milvus collection and index it using LlamaIndex. A Milvus collection is a container that stores a set of vectors, and an index is a data structure that allows for efficient querying of the vectors. Once you have created a collection and indexed it, you can use LlamaIndex to query the collection and retrieve the relevant vectors.
To query a Milvus collection using LlamaIndex, you need to provide a query vector and a similarity metric. The query vector represents the prompt or query you want to retrieve data for, and the similarity metric measures how similar two vectors are. LlamaIndex uses the similarity metric to rank the vectors in the collection and return the top-ranked vectors.
For example, if you want to retrieve documents that are similar to a given document, you can create a query vector that represents the document and use LlamaIndex to query the collection. LlamaIndex will return a list of documents that are similar to the query document, along with their similarity scores. This process ensures that the most relevant documents are retrieved, providing a solid foundation for further analysis or text generation.
Building and Integrating the AI Agent
To build and integrate an AI agent with Milvus and LlamaIndex, you need to create a knowledge base that stores the data the agent will use to generate text. The knowledge base can be a Milvus collection that stores a set of vectors, each representing a piece of text.
To integrate the AI agent with Milvus and LlamaIndex, you need to use LlamaIndex to query the knowledge base and retrieve the relevant vectors. The AI agent can then use these vectors to generate text based on a given prompt.
For example, if you want to build an AI agent that can answer questions about a particular topic, you can create a knowledge base that stores a set of vectors representing documents related to the topic. When the agent receives a question, it can use LlamaIndex to query the knowledge base and retrieve the relevant vectors. The agent can then use these vectors to generate an answer to the question.
Overall, Milvus and LlamaIndex provide a powerful combination of tools for building RAG systems and AI agents. By using these tools together, developers can build systems that can efficiently retrieve and generate text based on a given prompt, making them invaluable for applications in various domains, from customer support to content generation.
Summary of Building an AI Agent for Retrieval Augmented Generation with Milvus and LlamaIndex
In this article, we built an AI Agent for RAG using Milvus, LlamaIndex, and GPT 3.5. RAG is an LLM- based app architecture that uses vector databases to inject your data into an LLM as context . An AI Agent is an LLM-based application that can use other tools. To do RAG with an AI Agent, we provide it with the tools needed to do querying on a vector database.
In this tutorial, we showed how to build this architecture using sample data from Lyft and Uber. We first injected the data into Milvus so we could do RAG. Then, we turned our Milvus collections into query engines and the query engines into usable tools. Finally, we gave those tools to an AI Agent and used them to perform RAG on our documents.
- Introduction to Milvus and LlamaIndex for RAG
- The Tech Stack: Milvus Vector Store and LlamaIndex
- Querying and Retrieving Data with Milvus and LlamaIndex
- Building and Integrating the AI Agent
- Summary of Building an AI Agent for Retrieval Augmented Generation with Milvus and LlamaIndex
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading

Introducing IBM Data Prep Kit for Streamlined LLM Workflows
The Data Prep Kit (DPK) is an open-source toolkit by IBM Research designed to streamline unstructured data preparation for building AI applications.

Unlocking Rich Visual Insights with RGB-X Models
RGB-X models: advanced ML models in computer vision that extend traditional RGB data by combining additional depth, infrared, or surface normals data.

ColPali: Enhanced Document Retrieval with Vision Language Models and ColBERT Embedding Strategy
ColPali is an advanced document retrieval model designed to index and retrieve information directly from the visual features of documents, particularly PDFs.