Blog
How to Connect to Milvus Lite Using LangChain and LlamaIndex

How to Connect to Milvus Lite Using LangChain and LlamaIndex

Jun 07, 20244 min read

Milvus Lite, released just one week ago on May 31, is now the default method for third-party connectors like LangChain and LlamaIndex to connect to Milvus, the popular open-source vector database.


Method	Control Level for Retrieval Process	Time (seconds)
LlamaIndex	No control	2156
LangChain	Full control	8
Milvus Lite API	Full control	28

Table: Timings using the same HuggingFace embedding model (BAAI/bge-large-en-v1.5) and the same HTML data files.

The result? If you’re looking for the best balance between high control over Milvus settings and fast setup, using the Milvus Lite APIs directly is the optimal choice. The full code and timings are available on my GitHub.

In the following sections, we’ll cover:

Connecting to Milvus Lite using LlamaIndex
Connecting to Milvus Lite using LangChain
Connecting to Milvus Lite using Milvus APIs

Connecting to Milvus Lite Using LlamaIndex

It’s easy to get started using LlamaIndex. It takes about 2000 seconds to connect and create a collection.

from pymilvus import MilvusClient
from llama_index.core import (
   Settings,
   ServiceContext,
   StorageContext,
   VectorStoreIndex,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.milvus import MilvusVectorStore


# 1. Define the embedding model.
service_context = ServiceContext.from_defaults(
   # LlamaIndex local: translates to the same location as default HF cache.
   embed_model="local:BAAI/bge-large-en-v1.5")
# LlamaIndex hides this but we need it to create the vector store!
EMBEDDING_DIM = 1024


# 2. Create a Milvus collection from the documents and embeddings.
milvus_client = MilvusClient()
vector_store = MilvusVectorStore(
   client=milvus_client,
   dim=EMBEDDING_DIM,
   overwrite=True
)
storage_context = StorageContext.from_defaults(
   vector_store=vector_store
)
llamaindex = VectorStoreIndex.from_documents(
   # Chunk, embed, insert too slow!  Just use one document.
   docs[:1],
   storage_context=storage_context,
   service_context=service_context
)

Connecting to Milvus Lite Using LangChain

It’s easy to get started in LangChain. It takes about 8 seconds to connect and create a collection.

from langchain_milvus import Milvus
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter


# 1. Define the embedding model.
model_name = "BAAI/bge-large-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
embed_model = HuggingFaceEmbeddings(
   model_name=model_name,
   model_kwargs=model_kwargs,
   encode_kwargs=encode_kwargs)
EMBEDDING_DIM = embed_model.dict()['client'].get_sentence_embedding_dimension()


# 2. Create a Milvus collection from the documents and embeddings.
start_time = time.time()
vectorstore = Milvus.from_documents(
   documents=docs,
   embedding=embed_model,
   connection_args={
       "uri": "./milvus_demo.db",},
   # Override LangChain default values for Milvus.
   consistency_level="Eventually",
   drop_old=True,
   index_params = {
       "metric_type": "COSINE",
       "index_type": "AUTOINDEX",
       "params": {}}
)

Connecting to Milvus Lite Using Milvus Lite APIs

But what's happening behind the scenes? Let’s break down the actual steps and make the default values more explicit:

Start the Milvus Lite server and connect.
Select an embedding model.
Create a Milvus database collection.
1. Define a schema.
2. Choose an index (data structure for Approximate Nearest Neighbor search).
3. Choose a distance metric (definition of “close” in vector space).
4. Choose the consistency level for inserting data.
Select a chunking strategy.
Transform chunks of data into vectors using the embedding model inference.
Insert vector data into Milvus.

Here is the Python code using the Milvus Lite API directly. It takes about 28 seconds to connect and create a collection.

import pymilvus


# STEP 1. CONNECT A CLIENT TO LIGHT MILVUS PYTHON SERVER.
from pymilvus import MilvusClient
mc = MilvusClient("milvus_demo.db")


# STEP 2. DOWNLOAD AN OPEN SOURCE EMBEDDING MODEL.
from sentence_transformers import SentenceTransformer
model_name = "BAAI/bge-large-en-v1.5"
encoder = SentenceTransformer(model_name, device=’cpu’)


# STEP 3. CREATE A MILVUS COLLECTION AND DEFINE THE DATABASE INDEX.
# Uses Milvus AUTOINDEX, which defaults to HNSW.
COLLECTION_NAME = "MilvusDocs"
mc.create_collection(COLLECTION_NAME,
       EMBEDDING_DIM,
       consistency_level="Eventually",
       auto_id=True, 
       overwrite=True,)


# STEP 4. CHUNK DATA INTO VECTORS.
from langchain_community.document_transformers import BeautifulSoupTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Define chunk size and overlap.
chunk_size = 512
chunk_overlap = np.round(chunk_size * 0.10, 0)
# Split the documents into recursive, overlapping chunks.
child_splitter = RecursiveCharacterTextSplitter(
   chunk_size = chunk_size,
   chunk_overlap = chunk_overlap,
   length_function = len,  # use built-in Python len function)
chunks = child_splitter.split_documents(docs)


# STEP 5. TRANSFORM CHUNKS INTO VECTORS USING EMBEDDING MODEL INFERENCE.
list_of_strings = [doc.page_content for doc in chunks if hasattr(doc, 'page_content')]
embeddings = torch.tensor(encoder.encode(list_of_strings))


# STEP 6. INSERT CHUNK LIST INTO MILVUS.
# First, create chunk_list and dict_list.
dict_list = []
for chunk, sparse, dense in zip(chunks, embeddings["sparse"], embeddings["dense"]):
   chunk_dict = {
       'chunk': chunk.page_content,
       'source': chunk.metadata.get('source', ""),
       'vector': dense
   }
   dict_list.append(chunk_dict)
mc.insert(
   COLLECTION_NAME,
   data=dict_list,
   progress_bar=True)

Choosing the Right Milvus Light Method

While the different Milvus Lite APIs offer conveniences, they come with trade-offs in terms of control over retrieval and chunking methods and speed.

Using Milvus Lite APIs directly provides the highest control over Milvus retrieval settings balanced with the fastest collection creation speed.

Resources and Further Reading

Milvus Lite docs

Milvus Lite LlamaIndex docs

Milvus Lite LangChain docs

LangChain Milvus docs

LlamaIndex Milvus docs

Updated on Nov 01, 2025

Christy Bergman
Christy Bergman is a passionate Developer Advocate at Zilliz. She previously worked in distributed computing at Anyscale and as a Specialist AI/ML Solutions Architect at AWS. Christy studied applied math, is a self-taught coder, and has published papers, including one with ACM Recsys. She enjoys hiking and bird watching.

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

Will Amazon S3 Vectors Kill Vector Databases—or Save Them?

AWS S3 Vectors aims for 90% cost savings for vector storage. But will it kill vectordbs like Milvus? A deep dive into costs, limits, and the future of tiered storage.

How AI Is Transforming Information Retrieval and What’s Next for You

This blog will summarize the monumental changes AI brought to Information Retrieval (IR) in 2024.

Semantic Search vs. Lexical Search vs. Full-text Search

Lexical search offers exact term matching; full-text search allows for fuzzy matching; semantic search understands context and intent.