Integrations
Mistral AI and Zilliz Cloud Integration

Mistral AI and Zilliz Cloud Integration

Mistral AI and Zilliz Cloud integrate to build multi-agent RAG systems with metadata filtering, combining Mistral AI's frontier LLMs and embedding models with advanced function calling capabilities alongside Zilliz Cloud's high-performance vector database for scalable similarity search and intelligent data retrieval.

Use this integration for Free

What is Mistral AI
Mistral AI is a research lab building top-tier open source and frontier LLMs and embedding models. Their models include Mistral Nemo with a large context window of up to 128k tokens, Mistral Large with advanced reasoning and function calling capabilities, and the mistral-embed embedding model trained with retrievals in mind. Mistral AI's models have shown to be particularly good in RAG and function calling, making them well-suited for building intelligent, agentic AI systems.

By integrating with Zilliz Cloud (fully managed Milvus), Mistral AI's LLMs and embedding models are paired with a scalable vector database, enabling developers to build advanced RAG systems with metadata filtering, multi-agent orchestration, and automated data search — leveraging Mistral's function calling capabilities to coordinate intelligent retrieval from Zilliz Cloud's vector store.
Benefits of the Mistral AI + Zilliz Cloud Integration
- Advanced function calling for RAG: Mistral AI's native function calling capabilities allow agents to intelligently decide when and how to query Zilliz Cloud's vector store, enabling automated and context-aware data retrieval.
- Multi-agent orchestration: Mistral Large orchestrates multiple agent services that retrieve and process information from Zilliz Cloud, enabling complex multi-step queries across different data sources and document collections.
- Metadata filtering for precise retrieval: The integration supports Milvus metadata filtering, allowing agents to automatically generate and apply filters (company name, year, file name) to narrow search results and avoid confusion across large datasets.
- Embedding and LLM from one provider: Mistral AI provides both embedding models (mistral-embed) and LLMs (Nemo, Large), creating a cohesive pipeline from embedding generation to intelligent response, all stored and retrieved through Zilliz Cloud.
How the Integration Works
Mistral AI provides both the embedding model (mistral-embed) for generating vector representations and LLMs (Mistral Nemo, Mistral Large) for reasoning, function calling, and response generation. Mistral Large acts as the orchestrator in multi-agent systems, coordinating tool calls and agent services through its advanced function calling capabilities.

Zilliz Cloud serves as the vector database layer through LlamaIndex's MilvusVectorStore, storing and indexing document embeddings for fast similarity search with metadata filtering support. It provides the retrieval backend for agent services, enabling efficient vector search with customizable filters.

Together, Mistral AI and Zilliz Cloud create an intelligent multi-agent RAG system: documents are embedded using Mistral's embedding model and stored in Zilliz Cloud with metadata. Agents powered by Mistral Nemo handle individual retrieval tasks with metadata filtering, while Mistral Large orchestrates the overall workflow — coordinating multiple agents, extracting metadata filters from user queries, and generating accurate responses based on precisely retrieved context.

Step-by-Step Guide

1. Install Dependencies

$ pip install llama-agents pymilvus openai python-dotenv
$ pip install llama-index-vector-stores-milvus llama-index-readers-file llama-index-llms-ollama llama-index-llms-mistralai llama-index-embeddings-mistralai

2. Set Up API Key and Embedding Model

Get your Mistral API key from the Mistral Cloud Console and define the embedding model:

from dotenv import load_dotenv
import os

load_dotenv()

from llama_index.core import Settings
from llama_index.embeddings.mistralai import MistralAIEmbedding

Settings.embed_model = MistralAIEmbedding(model_name="mistral-embed")

3. Set Up LLM and Load Data into Milvus

Define the LLM, download data, and create a vector store index:

from llama_index.llms.ollama import Ollama

Settings.llm = Ollama("mistral-nemo")

$ mkdir -p 'data/10k/'
$ wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
$ wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext

input_files = ["./data/10k/lyft_2021.pdf", "./data/10k/uber_2021.pdf"]

vector_store = MilvusVectorStore(
    uri="./milvus_demo.db", dim=1024, overwrite=False, collection_name="companies_docs"
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

docs = SimpleDirectoryReader(input_files=input_files).load_data()

index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)

Setting the uri as a local file, e.g. ./milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite. If you have large scale of data, you can set up a more performant Milvus server on Docker or Kubernetes. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust the uri and token, which correspond to the Public Endpoint and API Key in Zilliz Cloud.

4. Define Tools and Query with Function Calling

Define query engine tools and use Mistral's predict_and_call for function calling:

from llama_index.core.tools import QueryEngineTool, ToolMetadata

company_engine = index.as_query_engine(similarity_top_k=3)

query_engine_tools = [
    QueryEngineTool(
        query_engine=company_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description="Provides information about Lyft financials for year 2021.",
        ),
    ),
    QueryEngineTool(
        query_engine=company_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description="Provides information about Uber financials for year 2021.",
        ),
    ),
]

llm = Ollama(model="mistral-nemo")

response = llm.predict_and_call(
    query_engine_tools,
    user_msg="Could you please provide a comparison between Lyft and Uber's total revenues in 2021?",
    allow_parallel_tool_calls=True,
)
print(response)

5. Apply Metadata Filtering

Use Milvus metadata filtering to create a filtered query engine:

from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

filters = MetadataFilters(
    filters=[ExactMatchFilter(key="file_name", value="lyft_2021.pdf")]
)

filtered_query_engine = index.as_query_engine(filters=filters)

6. Orchestrate Multi-Agent System with Mistral Large

Use Mistral Large to orchestrate multiple agent services with llama-agents:

from llama_agents import (
    AgentService, ToolService, LocalLauncher,
    MetaServiceTool, ControlPlaneServer,
    SimpleMessageQueue, AgentOrchestrator,
)
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.llms.mistralai import MistralAI

message_queue = SimpleMessageQueue()
control_plane = ControlPlaneServer(
    message_queue=message_queue,
    orchestrator=AgentOrchestrator(llm=MistralAI("mistral-large-latest")),
)

tool_service = ToolService(
    message_queue=message_queue,
    tools=query_engine_tools,
    running=True,
    step_interval=0.5,
)

meta_tools = [
    await MetaServiceTool.from_tool_service(
        t.metadata.name,
        message_queue=message_queue,
        tool_service=tool_service,
    )
    for t in query_engine_tools
]

worker1 = FunctionCallingAgentWorker.from_tools(
    meta_tools, llm=MistralAI("mistral-large-latest")
)
agent1 = worker1.as_agent()
agent_server_1 = AgentService(
    agent=agent1,
    message_queue=message_queue,
    description="Used to answer questions over different companies for their Financial results",
    service_name="Companies_analyst_agent",
)

launcher = LocalLauncher(
    [agent_server_1, tool_service],
    control_plane,
    message_queue,
)

query_str = "What are the risk factors for Uber?"
result = launcher.launch_single(query_str)
print(result)

Learn More
- Multi-agent Systems with Mistral AI, Milvus and Llama-agents — Official Milvus tutorial for multi-agent systems with Mistral AI
- Build RAG Chatbot with LangChain, Milvus, and Mistral AI — Zilliz RAG tutorial with Mistral AI
- Mistral AI Documentation — Official Mistral AI documentation
- Mistral AI Embeddings — Mistral AI embedding model documentation
- Mistral AI Function Calling — Mistral AI function calling documentation

Mistral AI and Zilliz Cloud Integration

What is Mistral AI

Benefits of the Mistral AI + Zilliz Cloud Integration

How the Integration Works

Step-by-Step Guide

Learn More

Related Resources

Harnessing Function Calling to Build Smarter LLM Applications

10 Open-Source LLM Frameworks Developers Can't Ignore in 2025

Advanced Video Search: Leveraging Twelve Labs and Milvus for Semantic Retrieval