Langfuse and Zilliz Cloud Integration
Langfuse and Zilliz Cloud integrate to provide observability and analytics for RAG applications, combining Langfuse's open-source LLM engineering platform for tracing, quality monitoring, and user analysis with Zilliz Cloud's high-performance vector database for efficient retrieval in production LLM systems.
Use this integration for FreeWhat is Langfuse
Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications. It offers comprehensive observability and product analytics including tracing of complex LLM application contexts (chained/agentic calls), quality monitoring with model-based evaluations and user feedback, and user analysis that classifies varying inputs and reveals real-world usage patterns. All platform features are natively integrated to accelerate the development workflow.
By integrating with Zilliz Cloud (fully managed Milvus), Langfuse provides deep observability into RAG pipelines built on scalable vector databases, enabling teams to monitor embedding quality and relevance, optimize vector search performance through detailed analytics, and fine-tune retrieval processes to align with user needs.
Benefits of the Langfuse + Zilliz Cloud Integration
- End-to-end RAG tracing: Langfuse captures the full request lifecycle — from embedding generation to Zilliz Cloud vector retrieval to LLM response — providing complete visibility into RAG pipeline performance.
- Quality monitoring with scoring: Langfuse attaches scores to production traces for output assessment, supporting model-based evaluations, user feedback, and manual labeling to monitor quality trends across Zilliz Cloud-backed RAG applications.
- Embedding and retrieval analytics: The integration enables monitoring of embedding quality and relevance, helping teams optimize vector search performance and accuracy in Zilliz Cloud through detailed analytics.
- Framework-agnostic with LlamaIndex support: Langfuse provides framework-agnostic SDKs with automated LlamaIndex instrumentation, making it easy to add observability to RAG pipelines using Zilliz Cloud's MilvusVectorStore.
How the Integration Works
Langfuse serves as the observability and analytics layer, capturing traces of all LLM application interactions including inference, embedding retrieval, API usage, and system interactions. It provides a callback handler that integrates with LlamaIndex to automatically trace queries, index operations, and chat interactions.
Zilliz Cloud serves as the vector database layer through LlamaIndex's MilvusVectorStore, storing and indexing document embeddings for fast similarity search. It handles the retrieval step in the RAG pipeline, which Langfuse then traces and monitors for quality.
Together, Langfuse and Zilliz Cloud create an observable RAG system: documents are indexed and stored in Zilliz Cloud via LlamaIndex, Langfuse's callback handler captures traces of every query and retrieval operation, and teams can view detailed traces in the Langfuse dashboard — monitoring retrieval quality, LLM response accuracy, and user interaction patterns to iteratively improve their RAG applications.
Step-by-Step Guide
1. Install Dependencies
$ pip install llama-index langfuse llama-index-vector-stores-milvus --upgrade2. Initialize Langfuse and OpenAI
Get your API keys from the Langfuse project settings and set up environment variables:
import os os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # 🇪🇺 EU region # os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # 🇺🇸 US region os.environ["OPENAI_API_KEY"] = ""Set up the Langfuse callback handler with LlamaIndex:
from llama_index.core import Settings from llama_index.core.callbacks import CallbackManager from langfuse.llama_index import LlamaIndexCallbackHandler langfuse_callback_handler = LlamaIndexCallbackHandler() Settings.callback_manager = CallbackManager([langfuse_callback_handler])3. Create Documents and Index with Milvus
Create sample documents and build a vector store index using Milvus:
from llama_index.core import Document, VectorStoreIndex, StorageContext from llama_index.vector_stores.milvus import MilvusVectorStore doc1 = Document(text=""" Maxwell "Max" Silverstein, a lauded movie director, screenwriter, and producer, was born on October 25, 1978, in Boston, Massachusetts... """) doc2 = Document(text=""" Throughout his career, Silverstein has been celebrated for his diverse range of filmography and unique narrative technique... """) vector_store = MilvusVectorStore( uri="tmp/milvus_demo.db", dim=1536, overwrite=False ) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( [doc1, doc2], storage_context=storage_context )4. Query and Chat with Tracing
Run queries and chat interactions — all automatically traced by Langfuse:
# Query response = index.as_query_engine().query("What did he do growing up?") print(response) # Chat response = index.as_chat_engine().chat("What did he do growing up?") print(response)5. Explore Traces in Langfuse
Flush the callback handler to immediately see results in Langfuse:
langfuse_callback_handler.flush()You can now view traces of your index and query operations in your Langfuse project dashboard, including detailed breakdowns of embedding generation, vector retrieval, and LLM response generation.
Learn More
- Using Langfuse to Evaluate RAG Quality — Official Milvus tutorial for tracing RAG with Langfuse
- The Path to Production: LLM Application Evaluations and Observability — Zilliz blog on LLM observability
- Langfuse LlamaIndex Integration Docs — Official Langfuse LlamaIndex integration documentation
- Langfuse GitHub Repository — Langfuse source code and community resources
- Langfuse Documentation — Official Langfuse documentation