Integrations
Voyage AI and Zilliz Cloud Integration

Voyage AI and Zilliz Cloud Integration

Voyage AI and Zilliz Cloud integrate to power semantic search and multimodal retrieval, combining Voyage AI's cutting-edge embedding models built on contrastive learning research with Zilliz Cloud's high-performance vector database for efficient similarity search across text and images.

Use this integration for Free

What is Voyage AI
Voyage AI is a team of leading AI researchers focused on advancing RAG technology, with expertise stemming from over five years of cutting-edge research at Stanford AI Lab and MIT NLP group. They provide embedding models that leverage contrastive learning approaches to create high-quality vector representations of text and images. Their flagship model voyage-2 outperforms competitive benchmarks with higher retrieval accuracy, extended context windows, and efficient inference, with domain-specific models available for code, law, finance, and multilingual use cases.

By integrating with Zilliz Cloud (fully managed Milvus), Voyage AI's embedding models can be used directly within Zilliz Cloud Pipelines to convert unstructured data into searchable vector embeddings, enabling scalable semantic search, multimodal retrieval, and RAG applications — all operating within a turnkey, fully managed environment.
Benefits of the Voyage AI + Zilliz Cloud Integration
- State-of-the-art retrieval accuracy: Voyage AI's embedding models outperform competitive benchmarks including OpenAI's text embedding models, delivering higher retrieval accuracy when paired with Zilliz Cloud's efficient similarity search.
- Multimodal search support: The voyage-multimodal-3 model supports both text and image embeddings, enabling cross-modal search applications where users can query with text to find images or vice versa, all stored and retrieved through Zilliz Cloud.
- Domain-specific models: Voyage AI offers specialized models like voyage-code-2 and voyage-law-2 for code and legal domains, with Zilliz Cloud providing the scalable storage and retrieval layer for these domain-optimized embeddings.
- Turnkey pipeline integration: Voyage AI models are available directly within Zilliz Cloud Pipelines, requiring no separate authentication or external account setup for a seamless embedding and retrieval experience.
How the Integration Works
Voyage AI serves as the embedding layer, converting text and images into high-dimensional vector representations using models like voyage-law-2 for text and voyage-multimodal-3 for multimodal content. It provides separate input types for documents and queries to optimize retrieval accuracy through its contrastive learning-based approach.

Zilliz Cloud serves as the vector database layer, storing and indexing the embeddings generated by Voyage AI for fast similarity search. It supports both local Milvus Lite deployments and fully managed cloud service, enabling efficient retrieval across large collections of text and image embeddings.

Together, Voyage AI and Zilliz Cloud create an end-to-end semantic search solution: documents and images are embedded using Voyage AI's models and stored in Zilliz Cloud. When a user submits a query — whether text or image — Voyage AI embeds it and Zilliz Cloud performs similarity search to find the most relevant results, enabling applications like document retrieval, multimodal search, and RAG-powered question answering.

Step-by-Step Guide

1. Install Required Packages

$ pip install --upgrade voyageai pymilvus

2. Text Search — Generate Embeddings and Store in Milvus

Use Voyage AI's embedding model to generate vector representations and store them in Milvus:

import voyageai
from pymilvus import MilvusClient

MODEL_NAME = "voyage-law-2"
DIMENSION = 1024

voyage_client = voyageai.Client(api_key="<YOUR_VOYAGEAI_API_KEY>")

docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

vectors = voyage_client.embed(texts=docs, model=MODEL_NAME, truncation=False).embeddings

data = [
    {"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"}
    for i in range(len(docs))
]

milvus_client = MilvusClient(uri="milvus_voyage_demo.db")
COLLECTION_NAME = "demo_collection"

if milvus_client.has_collection(collection_name=COLLECTION_NAME):
    milvus_client.drop_collection(collection_name=COLLECTION_NAME)
milvus_client.create_collection(collection_name=COLLECTION_NAME, dimension=DIMENSION)

res = milvus_client.insert(collection_name="demo_collection", data=data)
print(res["insert_count"])

As for the argument of MilvusClient: Setting the uri as a local file, e.g. ./milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file. If you have large scale of data, you can set up a more performant Milvus server on Docker or Kubernetes. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust the uri and token, which correspond to the Public Endpoint and API Key in Zilliz Cloud.

3. Text Search — Perform Semantic Search

Generate vector embedding for the query and conduct vector search:

queries = ["When was artificial intelligence founded?"]

query_vectors = voyage_client.embed(
    texts=queries, model=MODEL_NAME, truncation=False
).embeddings

res = milvus_client.search(
    collection_name=COLLECTION_NAME,
    data=query_vectors,
    limit=2,
    output_fields=["text", "subject"],
)

for q in queries:
    print("Query:", q)
    for result in res:
        print(result)
    print("\n")

4. Image Search — Prepare Multimodal Data

Convert PDF pages to images and generate multimodal embeddings using voyage-multimodal-3:

import base64
from io import BytesIO
import urllib.request
import fitz  # PyMuPDF
from PIL import Image


def pdf_url_to_screenshots(url: str, zoom: float = 1.0) -> list[Image]:
    if not url.startswith("http") and url.endswith(".pdf"):
        raise ValueError("Invalid URL")
    with urllib.request.urlopen(url) as response:
        pdf_data = response.read()
    pdf_stream = BytesIO(pdf_data)
    pdf = fitz.open(stream=pdf_stream, filetype="pdf")
    images = []
    mat = fitz.Matrix(zoom, zoom)
    for n in range(pdf.page_count):
        pix = pdf[n].get_pixmap(matrix=mat)
        img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
        images.append(img)
    pdf.close()
    return images


def image_to_base64(image):
    buffered = BytesIO()
    image.save(buffered, format="JPEG")
    img_str = base64.b64encode(buffered.getvalue())
    return img_str.decode("utf-8")


DIMENSION = 1024

pages = pdf_url_to_screenshots("https://www.fdrlibrary.org/documents/356632/390886/readingcopy.pdf", zoom=3.0)
inputs = [[img] for img in pages]

vectors = voyage_client.multimodal_embed(inputs, model="voyage-multimodal-3")

inputs = [i[0] if isinstance(i[0], str) else image_to_base64(i[0]) for i in inputs]
data = [
    {"id": i, "vector": vectors.embeddings[i], "data": inputs[i], "subject": "fruits"}
    for i in range(len(inputs))
]

5. Image Search — Store and Query

Store multimodal embeddings in Milvus and search with text or image queries:

milvus_client = MilvusClient(uri="milvus_voyage_multi_demo.db")
COLLECTION_NAME = "demo_collection"

if milvus_client.has_collection(collection_name=COLLECTION_NAME):
    milvus_client.drop_collection(collection_name=COLLECTION_NAME)
milvus_client.create_collection(collection_name=COLLECTION_NAME, dimension=DIMENSION)

res = milvus_client.insert(collection_name="demo_collection", data=data)
print(res["insert_count"])

queries = [["The consequences of a dictator's peace"]]

query_vectors = voyage_client.multimodal_embed(
    inputs=queries, model="voyage-multimodal-3", truncation=False
).embeddings

res = milvus_client.search(
    collection_name=COLLECTION_NAME,
    data=query_vectors,
    limit=4,
    output_fields=["data", "subject"],
)

Learn More
- Semantic Search with Milvus and VoyageAI — Official Milvus tutorial for semantic search with Voyage AI
- Voyage AI Embeddings and Rerankers for Search and RAG — Zilliz blog on Voyage AI embeddings and rerankers
- Using Voyage AI's Embedding Models in Zilliz Cloud Pipelines — Zilliz blog on Voyage AI in Zilliz Cloud Pipelines
- Zilliz Cloud Supports Models from OSS, VoyageAI, and OpenAI — Zilliz blog on embedding model support
- Voyage AI Documentation — Official Voyage AI documentation

Voyage AI and Zilliz Cloud Integration

What is Voyage AI

Benefits of the Voyage AI + Zilliz Cloud Integration

How the Integration Works

Step-by-Step Guide

Learn More

Related Resources

Voyage AI Embeddings and Rerankers for Search and RAG

Using Voyage AI's Embedding Models in Zilliz Cloud Pipelines

Zilliz Cloud Supports Models from OSS, VoyageAI, and OpenAI