LlamaIndex and Zilliz Cloud Integration
LlamaIndex and Zilliz Cloud integrate to build powerful Retrieval-Augmented Generation applications, combining LlamaIndex's flexible data framework for LLM applications with Zilliz Cloud's high-performance vector database for efficient document storage, retrieval, and context-aware AI responses.
Use this integration for FreeWhat is LlamaIndex
LlamaIndex (formerly GPT Index) is a simple, flexible data framework for connecting custom data sources to large language models (LLMs). It facilitates the ingestion, structuring, and access of private or domain-specific data, addressing the critical challenge that while LLMs are pre-trained on extensive publicly available datasets, they often lack domain-specific knowledge, which can result in hallucinations or incorrect answers.
By integrating with Zilliz Cloud (fully managed Milvus), LlamaIndex gains access to a fully managed, high-performance vector database that efficiently stores and retrieves document embeddings at scale, enabling developers to build production-ready RAG applications, document Q&A systems, and knowledge-based chatbots with minimal infrastructure management.
Benefits of the LlamaIndex + Zilliz Cloud Integration
- Flexible data ingestion with scalable storage: LlamaIndex handles the ingestion and structuring of diverse data sources, while Zilliz Cloud provides scalable vector storage and fast similarity search to power retrieval at any scale.
- Simple index creation and querying: With just a few lines of code, developers can create a vector store index backed by Zilliz Cloud, insert documents, and query them using natural language.
- Metadata filtering for precise retrieval: The integration supports metadata filtering, allowing applications to narrow search results by specific attributes such as file name, source, or custom tags for more targeted retrieval.
- Dual use as index or data connector: LlamaIndex can use Zilliz Cloud as an internal vector store index for direct document storage and querying, or as an external data connector that retrieves data and integrates it into LlamaIndex structures for further processing.
- Production-ready RAG pipelines: The combination delivers production-grade performance and reliability, making it straightforward to move from prototype to production deployment.
How the Integration Works
LlamaIndex serves as the data framework layer, handling document loading, chunking, embedding generation, and query orchestration. It provides a simple API for ingesting data from various sources (files, PDFs, APIs) and converting them into vector representations that can be stored and retrieved efficiently.
Zilliz Cloud serves as the vector database layer through the MilvusVectorStore, storing and indexing the document embeddings generated by LlamaIndex. It provides high-performance similarity search with low latency, enabling applications to retrieve the most relevant context from large knowledge bases efficiently.
Together, LlamaIndex and Zilliz Cloud create a complete RAG solution: LlamaIndex ingests and processes documents, stores their embeddings in Zilliz Cloud via MilvusVectorStore, and when a query comes in, retrieves the most relevant documents through vector similarity search to generate contextually informed responses using the LLM.
Step-by-Step Guide
1. Install Dependencies
Install the required packages:
$ pip install pymilvus>=2.4.2 $ pip install llama-index-vector-stores-milvus $ pip install llama-index2. Setup OpenAI
Add the OpenAI API key to access ChatGPT:
import openai openai.api_key = "sk-***********"3. Prepare Data
Download sample data for the demo:
! mkdir -p 'data/' ! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt' ! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/uber_2021.pdf'4. Generate Documents and Create an Index
Load the document using SimpleDirectoryReader and create a vector store index with MilvusVectorStore:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext from llama_index.vector_stores.milvus import MilvusVectorStore # load documents documents = SimpleDirectoryReader( input_files=["./data/paul_graham_essay.txt"] ).load_data() print("Document ID:", documents[0].doc_id) # create index with MilvusVectorStore vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)For the parameters of
MilvusVectorStore: Setting theurias a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file. If you have large scale of data, you can set up a more performant Milvus server on Docker or Kubernetes. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust theuriandtoken, which correspond to the Public Endpoint and API Key in Zilliz Cloud.5. Query the Data
Ask questions against the index, which uses the stored data as the knowledge base:
query_engine = index.as_query_engine() res = query_engine.query("What did the author learn?") print(res)6. Overwrite and Append Data
Test overwriting existing data by creating a new index with
overwrite=True:from llama_index.core import Document vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( [Document(text="The number that is being searched for is ten.")], storage_context, ) query_engine = index.as_query_engine() res = query_engine.query("Who is the author?") print(res)Add data to an existing index by setting
overwrite=False:del index, vector_store, storage_context, query_engine vector_store = MilvusVectorStore(uri="./milvus_demo.db", overwrite=False) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context) query_engine = index.as_query_engine() res = query_engine.query("What is the number?") print(res)7. Metadata Filtering
Load multiple documents and filter query results by metadata attributes:
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters # Load all documents documents_all = SimpleDirectoryReader("./data/").load_data() vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1536, overwrite=True) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents_all, storage_context) # Filter to only retrieve from a specific file filters = MetadataFilters( filters=[ExactMatchFilter(key="file_name", value="uber_2021.pdf")] ) query_engine = index.as_query_engine(filters=filters) res = query_engine.query("What challenges did the disease pose for the author?") print(res)Learn More
- Retrieval-Augmented Generation (RAG) with Milvus and LlamaIndex — Official Milvus tutorial for building RAG with LlamaIndex
- Building an AI Agent for RAG with Milvus and LlamaIndex — Zilliz blog on building AI agents with LlamaIndex and Milvus
- How to Connect to Milvus Lite Using LangChain and LlamaIndex — Zilliz blog on connecting to Milvus Lite
- How to Build RAG with Milvus Lite, Llama3 and LlamaIndex — Zilliz tutorial on building RAG with Llama3
- LlamaIndex Documentation — Official LlamaIndex documentation