LangChain and Zilliz Cloud Integration
Integrate Zilliz Cloud with LangChain to build LLM applications with semantic search, RAG, and context-aware retrieval using a flexible framework for orchestration, chains, and agents.
Use this integration for FreeWhat is LangChain
LangChain is an open-source framework for building LLM-powered agents and applications. It provides a prebuilt agent architecture and integrations for models and tools, making it easier to connect to providers such as OpenAI, Anthropic, and Google and start building quickly. LangChain is designed to be easy to use while still flexible enough for custom workflows, and its agents are built on top of LangGraph for capabilities such as durable execution, streaming, persistence, and human-in-the-loop support.
By integrating Zilliz Cloud (fully managed Milvus) with LangChain, you can add high-performance vector search and retrieval to your LLM applications, making it easier to build RAG pipelines, agents, and context-aware AI workflows on top of your own data.
Benefits of the LangChain + Zilliz Cloud Integration
- Makes it easier to build LLM applications that use external knowledge, tools, and retrieval, without having to stitch every component together manually.
- Uses LangChain to orchestrate prompts, chains, and retrieval flows within the application.
- Uses Zilliz Cloud (managed Milvus) as the vector database layer for storing embeddings and running semantic search over your data.
- Works especially well for RAG and document Q&A workloads, where relevant information needs to be retrieved dynamically to improve response quality.
- Let developers focus more on application logic and less on managing vector infrastructure and retrieval systems.
How the Integration Works
LangChain serves as a framework for developing language model-powered applications. It helps connect language models with contextual sources such as prompt instructions, few-shot examples, external tools, and relevant content stored in vector databases. This makes applications more context-aware and better able to generate responses grounded in the right information. LangChain also enables reasoning-driven workflows, allowing applications to decide how to respond based on the available context and what actions to take next. With its modular components and off-the-shelf chains, developers can either assemble common LLM workflows quickly or customize them for more complex applications.
Zilliz Cloud, the fully managed version of Milvus, provides the vector database layer for storing, indexing, and retrieving embeddings at scale. When application data is converted into vectors, Zilliz Cloud makes it possible to run fast similarity search and return the most relevant content for a given query. This gives LangChain applications a reliable way to fetch external knowledge and use it as context during generation.
Together, LangChain and Zilliz Cloud make it easier to build context-aware LLM applications and retrieval-augmented generation workflows. LangChain manages the application logic, chains, and reasoning flow, while Zilliz Cloud provides high-performance retrieval over your data. This combination helps developers build AI applications that can access relevant knowledge, respond with better grounding, and scale more effectively in production.
Step-by-Step Guide
1. Install the required packages
Start by installing the LangChain Milvus integration, Milvus Lite, and the OpenAI integration package. The Milvus documentation uses this setup for local development and prototyping. (Milvus)
pip install -qU langchain-milvus milvus-lite langchain-openaiMilvus Lite is included with the latest
pymilvusand stores everything in a local file, which makes it a simple option for getting started. For larger-scale workloads, the docs recommend running a full Milvus server instead. (Milvus)2.Initialize embeddings and connect to Milvus
Next, create your embedding model and initialize the Milvus vector store. In the official example, embeddings are generated with
OpenAIEmbeddings(model="text-embedding-3-large"), and the vector store points to a local.dbfile through Milvus Lite. The same interface can also connect to a running Milvus server by replacing the local URI with a server address such ashttp://localhost:19530. (Milvus)from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings(model="text-embedding-3-large") from langchain_milvus import Milvus # The easiest way is to use Milvus Lite where everything is stored in a local file. # If you have a Milvus server you can use the server URI such as "http://localhost:19530". URI = "./milvus_example.db" vector_store = Milvus( embedding_function=embeddings, connection_args={"uri": URI}, )3.Organize data with collections
Milvus supports storing unrelated datasets in different collections within the same Milvus instance. This helps keep contexts separate and makes the vector store easier to manage. The docs first show how to create a collection from documents, then how to load that same collection again by name. (Milvus)
from langchain_core.documents import Document vector_store_saved = Milvus.from_documents( [Document(page_content="foo!")], embeddings, collection_name="langchain_example", connection_args={"uri": URI}, )vector_store_loaded = Milvus( embeddings, connection_args={"uri": URI}, collection_name="langchain_example", )4.Add documents to the vector store
Once the vector store is ready, you can insert documents with
add_documents. The official tutorial uses ten sample documents and assigns each one a UUID before inserting them. Metadata such as"source": "tweet"or"source": "news"is stored alongside each document and can later be used for filtering during search. (Milvus)from uuid import uuid4 from langchain_core.documents import Document document_1 = Document( page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.", metadata={"source": "tweet"}, ) document_2 = Document( page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.", metadata={"source": "news"}, ) document_3 = Document( page_content="Building an exciting new project with LangChain - come check it out!", metadata={"source": "tweet"}, ) document_4 = Document( page_content="Robbers broke into the city bank and stole $1 million in cash.", metadata={"source": "news"}, ) document_5 = Document( page_content="Wow! That was an amazing movie. I can't wait to see it again.", metadata={"source": "tweet"}, ) document_6 = Document( page_content="Is the new iPhone worth the price? Read this review to find out.", metadata={"source": "website"}, ) document_7 = Document( page_content="The top 10 soccer players in the world right now.", metadata={"source": "website"}, ) document_8 = Document( page_content="LangGraph is the best framework for building stateful, agentic applications!", metadata={"source": "tweet"}, ) document_9 = Document( page_content="The stock market is down 500 points today due to fears of a recession.", metadata={"source": "news"}, ) document_10 = Document( page_content="I have a bad feeling I am going to get deleted :(", metadata={"source": "tweet"}, ) documents = [ document_1, document_2, document_3, document_4, document_5, document_6, document_7, document_8, document_9, document_10, ] uuids = [str(uuid4()) for _ in range(len(documents))] vector_store.add_documents(documents=documents, ids=uuids)5.Delete documents when needed
The tutorial also shows how to remove documents by ID. This is useful when records become outdated or need to be updated through a delete-and-reinsert workflow. In the example below, the last inserted document is deleted using its UUID. (Milvus)
vector_store.delete(ids=[uuids[-1]])6. Run a similarity search
After documents are stored, you can query the vector store directly. The basic example uses
similarity_search()withk=2and a metadata filter expression,expr='source == "tweet"', to search only among tweet-like documents. This shows that LangChain + Milvus supports both vector similarity and structured filtering in the same query. (Milvus)results = vector_store.similarity_search( "LangChain provides abstractions to make working with LLMs easy", k=2, expr='source == "tweet"', # param=... # Search params for the index type ) for res in results: print(f"* {res.page_content} [{res.metadata}]")Example output from the docs:
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet', 'pk': 'e991a253-5f37-46ae-850a-82a660e33013'}] * LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet', 'pk': 'eb149e29-239a-4e2c-9f99-751cb7207abf'}]7. Search with similarity scores
If you also want the similarity score, use
similarity_search_with_score(). The official example queries weather-related content and filters to"news"documents. This is useful when you want to inspect ranking quality or decide how to threshold retrieval results before passing them to an LLM. (Milvus)results = vector_store.similarity_search_with_score( "Will it be hot tomorrow?", k=1, expr='source == "news"' ) for res, score in results: print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")Example output from the docs:
* [SIM=0.893776] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news', 'pk': 'dbf6560a-1487-4a6e-8797-245d57874f5b'}]8. Turn the vector store into a retriever
For most LangChain applications, especially RAG pipelines, it is often more convenient to convert the vector store into a retriever. The documentation shows this with
as_retriever(search_type="mmr", search_kwargs={"k": 1}). Once converted, you can callinvoke()and still apply metadata filters throughexpr. (Milvus)retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1}) retriever.invoke("Stealing from the bank is a crime", expr='source == "news"')Example output from the docs:
[Document(metadata={'source': 'news', 'pk': '2818c051-5a1a-44cb-9deb-aaaac709f616'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]This is the point where the integration becomes especially useful for downstream chains and agents, because retrievers plug naturally into larger LangChain workflows. (Milvus)
9. Use partition keys for per-user retrieval
If your application serves multiple users, the Milvus docs recommend using
partition_keyfor multi-tenancy so that users only retrieve their own data. The example uses anamespacefield as the partition key and then filters retrieval with expressions likenamespace == "ankush". The docs also note that partition key is not available in Milvus Lite, so this part requires a Milvus server running through Docker or Kubernetes. (Milvus)from langchain_core.documents import Document docs = [ Document(page_content="i worked at kensho", metadata={"namespace": "harrison"}), Document(page_content="i worked at facebook", metadata={"namespace": "ankush"}), ] vectorstore = Milvus.from_documents( docs, embeddings, collection_name="partitioned_collection", # Use a different collection name connection_args={"uri": URI}, # drop_old=True, partition_key_field="namespace", # Use the "namespace" field as the partition key )To search with the partition key, include it in the boolean expression:
search_kwargs={"expr": '<partition_key> == "xxxx"'}or
search_kwargs={"expr": '<partition_key> == in ["xxx", "xxx"]'}Replace
<partition_key>with the actual field name you designated as the partition key. Milvus then filters entities by partition key before searching. (Milvus)Example retrievals from the docs:
# This will only get documents for Ankush vectorstore.as_retriever(search_kwargs={"expr": 'namespace == "ankush"'}).invoke( "where did i work?" )[Document(metadata={'namespace': 'ankush', 'pk': 460829372217788296}, page_content='i worked at facebook')]# This will only get documents for Harrison vectorstore.as_retriever(search_kwargs={"expr": 'namespace == "harrison"'}).invoke( "where did i work?" )[Document(metadata={'namespace': 'harrison', 'pk': 460829372217788295}, page_content='i worked at kensho')]Learn More
- Tutorial | Get Started with LangChain and Milvus
- Tutorial | Ultimate Guide to Getting Started with LangChain
- Tutorial | Using LangChain to Self-Query a Vector Database
- Video with Harrison Chase | Memory for LLM applications: Different retrieval techniques for getting the most relevant context
- Video Shorts with Yujian Tang | How to Add Conversational Memory to an LLM Using LangChain
- Video with Lance Martin | Debugging your RAG apps with LangSmith