Building an AI Agent for RAG with Milvus and LlamaIndex
In 2023, there was a massive explosion in the popularity of large language models (LLMs), and LLM applications as a result. The two most popular types of LLM applications that have come to the forefront are retrieval augmented generation (RAG) and AI agents. RAG is about using a vector database like Milvus to inject contextual data. AI Agents are about using LLMs to use other tools. You can find the notebook on GitHub.
In this article, we’re going to combine the two. We cover:
The Tech Stack: Milvus and LlamaIndex
Milvus Lite
Milvus on Docker Compose
LlamaIndex
Building an AI Agent for RAG
Spinning up Milvus
Loading the Data into Milvus via LlamaIndex
Creating the Query Engine Tools for the AI Agent
Creating the AI Agent for RAG
Summary of Building an AI Agent for RAG with Milvus and LlamaIndex
The Tech Stack: Milvus and LlamaIndex
For this AI Agent that does RAG, we actually use three technologies: Milvus, LlamaIndex, and OpenAI. OpenAI doesn’t need an introduction. You can also use OctoAI or a HuggingFace LLM as a drop-in. We may update this with a version using either of those later.
Milvus Lite
There are many ways we can use Milvus. Milvus Lite, which we can spin up directly in our Jupyter notebook, and Milvus through Docker. Milvus lite can be installed through PyPi using pip install milvus
. Then, it can be spun up, used, and spun down directly in your notebook.
Milvus on Docker Compose
Milvus is a distributed system, so naturally, it makes sense to spin up Milvus with Docker Compose. The docker compose file is available on the page and the Milvus GitHub. When you spin up Milvus with Docker Compose, you will see three containers and connect to Milvus through port 19530 by default.
LlamaIndex
LlamaIndex is one of the most popular frameworks for building LLM apps. It is one of three popular projects - LlamaIndex, LangChain, and Haystack. LlamaIndex's main focus is to provide a framework specialized for retrieval tasks. In addition, it provides tools for building AI Agents.
Building an AI Agent for RAG with Milvus
There are many ways to build RAG and AI Agents. This example builds on my original tutorial on a RAG AI Agent. The primary difference between these two examples is that this one uses Milvus as a persistent vector store with LlamaIndex. The LlamaIndex imports remain the same - the three imports we get from the core piece of LlamaIndex are the directory reader, the vector store index, and the storage context.
We also grab the query engineer tool and the tool metadata object to create and describe our tool for RAG. Compared to the last version, the additional import we make here is to get the MilvusVectorStore object for LlamaIndex. To get all of these, you need to run pip install -U llama-index llama-index-vector-stores-milvus pymilvus llama-index-llms-openai llama-index-readers-file
.
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
StorageContext
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.vector_stores.milvus import MilvusVectorStore
Spinning up Milvus
A critical step in doing RAG with a vector database backend is to spinning up our vector database. Milvus is the only distributed vector database on the market, and it spins up in two ways. First, we can directly import the default_server
and start()
it. Second, we can spin it up via Docker Compose. The code for both is shown below.
from milvus import default_server
default_server.start()
Or run docker compose up -d
in the terminal in the same directory that your docker-compose.yml
file is.
Loading the Data into Milvus via LlamaIndex
The first step to building a RAG app is loading your data into your vector database. For this tutorial, we do this via LlamaIndex. We use their simple directory reader to read the input files. In this case, we are looking at the financial documents for Lyft and Uber.
# load data
lyft_docs = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
input_files=["./data/10k/uber_2021.pdf"]
).load_data()
Once the data is loaded, we need to create a MilvusVectorStore
object in LlamaIndex to hold it. We use OpenAI for both the embedding model and the LLM in this example, so we pass a dimension of 1536. The only difference between our Milvus instances that hold the data is the collection name.
If you use Milvus Lite instead of Milvus through Docker Compose, you should also pass the host and port to ensure that you connect to the right port. We must get the port from the default server through the listen_port
option.
# build index
vector_store_lyft = MilvusVectorStore(dim=1536, collection_name="lyft", overwrite=True)
vector_store_uber = MilvusVectorStore(dim=1536, collection_name="uber", overwrite=True)
# for milvus lite users
if milvuslite:
vector_store_lyft = MilvusVectorStore(host="localhost", port=default_server.listen_port, dim=1536, collection_name="lyft", overwrite=True)
vector_store_uber = MilvusVectorStore(host="localhost", port=default_server.listen_port, dim=1536, collection_name="uber", overwrite=True)
We have to be able to pass the data around, not just load it. LlamaIndex handles this abstraction by providing a StorageContext
object. We pass the vector store into the storage context for both Lyft and Uber to be able to pass that data around. Then, we create indexes on those vector stores. The last two lines in this block persist those vector stores on your local disk so you can retrieve them from storage next time.
storage_context_lyft = StorageContext.from_defaults(vector_store=vector_store_lyft)
storage_context_uber = StorageContext.from_defaults(vector_store=vector_store_uber)
lyft_index = VectorStoreIndex.from_documents(lyft_docs, storage_context=storage_context_lyft)
uber_index = VectorStoreIndex.from_documents(uber_docs, storage_context=storage_context_uber)
# persist index
lyft_index.storage_context.persist(persist_dir="./storage/lyft")
uber_index.storage_context.persist(persist_dir="./storage/uber")
Creating the Query Engine Tools for the AI Agent
For an AI Agent to do RAG, it needs to be able to use tools to perform querying on vector databases. In this next block, we turn the vector indexes we created into query engines that retrieve the top 3 most similar results.
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
We then turn the query engines into tools. The ReAct agent for LlamaIndex (imported in the next section) takes a list of tools as part of its input. So, in this section, we create that list of tools. We only need to create two tools, both with nearly identical structures. The Query Engine Tool requires a query engine and metadata. The metadata is used to name the tool and tell the LLM what it does and how to use it.
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(
name="lyft_10k",
description=(
"Provides information about Lyft financials for year 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(
name="uber_10k",
description=(
"Provides information about Uber financials for year 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
]
Creating the AI Agent for RAG
Everything is prepared for the last piece - putting the pieces of the agent together. Here, we import the ReAct Agent tool and OpenAI from LlamaIndex. We define the LLM as GPT-3.5, although you can actually use any LLM you want. For the agent, we pass it our list of tools from earlier, the LLM, and in this particular example, “verbose” to see its “thoughts”.
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo-0613")
agent = ReActAgent.from_tools(
query_engine_tools,
llm=llm,
verbose=True
)
response = agent.chat("What was Lyft's revenue growth in 2021?")
print(str(response))
When asked what Lyft’s revenue growth in 2021 was, we expect a response like the one below.
Summary of Building an AI Agent for RAG with Milvus and LlamaIndex
In this article, we built an AI Agent for RAG using Milvus, LlamaIndex, and GPT 3.5. RAG is an LLM- based app architecture that uses vector databases to inject your data into an LLM as context . An AI Agent is an LLM-based application that can use other tools. To do RAG with an AI Agent, we provide it with the tools needed to do querying on a vector database.
In this tutorial, we showed how to build this architecture using sample data from Lyft and Uber. We first injected the data into Milvus so we could do RAG. Then, we turned our Milvus collections into query engines and the query engines into usable tools. Finally, we gave those tools to an AI Agent and used them to perform RAG on our documents.
- The Tech Stack: Milvus and LlamaIndex
- Building an AI Agent for RAG with Milvus
- Summary of Building an AI Agent for RAG with Milvus and LlamaIndex
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free