Voyage AI and Zilliz Cloud Integration
Voyage AI and Zilliz Cloud integrate to power semantic search and multimodal retrieval, combining Voyage AI's cutting-edge embedding models built on contrastive learning research with Zilliz Cloud's high-performance vector database for efficient similarity search across text and images.
Use this integration for FreeWhat is Voyage AI
Voyage AI is a team of leading AI researchers focused on advancing RAG technology, with expertise stemming from over five years of cutting-edge research at Stanford AI Lab and MIT NLP group. They provide embedding models that leverage contrastive learning approaches to create high-quality vector representations of text and images. Their flagship model
voyage-2outperforms competitive benchmarks with higher retrieval accuracy, extended context windows, and efficient inference, with domain-specific models available for code, law, finance, and multilingual use cases.By integrating with Zilliz Cloud (fully managed Milvus), Voyage AI's embedding models can be used directly within Zilliz Cloud Pipelines to convert unstructured data into searchable vector embeddings, enabling scalable semantic search, multimodal retrieval, and RAG applications — all operating within a turnkey, fully managed environment.
Benefits of the Voyage AI + Zilliz Cloud Integration
- State-of-the-art retrieval accuracy: Voyage AI's embedding models outperform competitive benchmarks including OpenAI's text embedding models, delivering higher retrieval accuracy when paired with Zilliz Cloud's efficient similarity search.
- Multimodal search support: The
voyage-multimodal-3model supports both text and image embeddings, enabling cross-modal search applications where users can query with text to find images or vice versa, all stored and retrieved through Zilliz Cloud. - Domain-specific models: Voyage AI offers specialized models like
voyage-code-2andvoyage-law-2for code and legal domains, with Zilliz Cloud providing the scalable storage and retrieval layer for these domain-optimized embeddings. - Turnkey pipeline integration: Voyage AI models are available directly within Zilliz Cloud Pipelines, requiring no separate authentication or external account setup for a seamless embedding and retrieval experience.
How the Integration Works
Voyage AI serves as the embedding layer, converting text and images into high-dimensional vector representations using models like
voyage-law-2for text andvoyage-multimodal-3for multimodal content. It provides separate input types for documents and queries to optimize retrieval accuracy through its contrastive learning-based approach.Zilliz Cloud serves as the vector database layer, storing and indexing the embeddings generated by Voyage AI for fast similarity search. It supports both local Milvus Lite deployments and fully managed cloud service, enabling efficient retrieval across large collections of text and image embeddings.
Together, Voyage AI and Zilliz Cloud create an end-to-end semantic search solution: documents and images are embedded using Voyage AI's models and stored in Zilliz Cloud. When a user submits a query — whether text or image — Voyage AI embeds it and Zilliz Cloud performs similarity search to find the most relevant results, enabling applications like document retrieval, multimodal search, and RAG-powered question answering.
Step-by-Step Guide
1. Install Required Packages
$ pip install --upgrade voyageai pymilvus2. Text Search — Generate Embeddings and Store in Milvus
Use Voyage AI's embedding model to generate vector representations and store them in Milvus:
import voyageai from pymilvus import MilvusClient MODEL_NAME = "voyage-law-2" DIMENSION = 1024 voyage_client = voyageai.Client(api_key="<YOUR_VOYAGEAI_API_KEY>") docs = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England.", ] vectors = voyage_client.embed(texts=docs, model=MODEL_NAME, truncation=False).embeddings data = [ {"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"} for i in range(len(docs)) ] milvus_client = MilvusClient(uri="milvus_voyage_demo.db") COLLECTION_NAME = "demo_collection" if milvus_client.has_collection(collection_name=COLLECTION_NAME): milvus_client.drop_collection(collection_name=COLLECTION_NAME) milvus_client.create_collection(collection_name=COLLECTION_NAME, dimension=DIMENSION) res = milvus_client.insert(collection_name="demo_collection", data=data) print(res["insert_count"])As for the argument of
MilvusClient: Setting theurias a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file. If you have large scale of data, you can set up a more performant Milvus server on Docker or Kubernetes. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust theuriandtoken, which correspond to the Public Endpoint and API Key in Zilliz Cloud.3. Text Search — Perform Semantic Search
Generate vector embedding for the query and conduct vector search:
queries = ["When was artificial intelligence founded?"] query_vectors = voyage_client.embed( texts=queries, model=MODEL_NAME, truncation=False ).embeddings res = milvus_client.search( collection_name=COLLECTION_NAME, data=query_vectors, limit=2, output_fields=["text", "subject"], ) for q in queries: print("Query:", q) for result in res: print(result) print("\n")4. Image Search — Prepare Multimodal Data
Convert PDF pages to images and generate multimodal embeddings using
voyage-multimodal-3:import base64 from io import BytesIO import urllib.request import fitz # PyMuPDF from PIL import Image def pdf_url_to_screenshots(url: str, zoom: float = 1.0) -> list[Image]: if not url.startswith("http") and url.endswith(".pdf"): raise ValueError("Invalid URL") with urllib.request.urlopen(url) as response: pdf_data = response.read() pdf_stream = BytesIO(pdf_data) pdf = fitz.open(stream=pdf_stream, filetype="pdf") images = [] mat = fitz.Matrix(zoom, zoom) for n in range(pdf.page_count): pix = pdf[n].get_pixmap(matrix=mat) img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) images.append(img) pdf.close() return images def image_to_base64(image): buffered = BytesIO() image.save(buffered, format="JPEG") img_str = base64.b64encode(buffered.getvalue()) return img_str.decode("utf-8") DIMENSION = 1024 pages = pdf_url_to_screenshots("https://www.fdrlibrary.org/documents/356632/390886/readingcopy.pdf", zoom=3.0) inputs = [[img] for img in pages] vectors = voyage_client.multimodal_embed(inputs, model="voyage-multimodal-3") inputs = [i[0] if isinstance(i[0], str) else image_to_base64(i[0]) for i in inputs] data = [ {"id": i, "vector": vectors.embeddings[i], "data": inputs[i], "subject": "fruits"} for i in range(len(inputs)) ]5. Image Search — Store and Query
Store multimodal embeddings in Milvus and search with text or image queries:
milvus_client = MilvusClient(uri="milvus_voyage_multi_demo.db") COLLECTION_NAME = "demo_collection" if milvus_client.has_collection(collection_name=COLLECTION_NAME): milvus_client.drop_collection(collection_name=COLLECTION_NAME) milvus_client.create_collection(collection_name=COLLECTION_NAME, dimension=DIMENSION) res = milvus_client.insert(collection_name="demo_collection", data=data) print(res["insert_count"]) queries = [["The consequences of a dictator's peace"]] query_vectors = voyage_client.multimodal_embed( inputs=queries, model="voyage-multimodal-3", truncation=False ).embeddings res = milvus_client.search( collection_name=COLLECTION_NAME, data=query_vectors, limit=4, output_fields=["data", "subject"], )Learn More
- Semantic Search with Milvus and VoyageAI — Official Milvus tutorial for semantic search with Voyage AI
- Voyage AI Embeddings and Rerankers for Search and RAG — Zilliz blog on Voyage AI embeddings and rerankers
- Using Voyage AI's Embedding Models in Zilliz Cloud Pipelines — Zilliz blog on Voyage AI in Zilliz Cloud Pipelines
- Zilliz Cloud Supports Models from OSS, VoyageAI, and OpenAI — Zilliz blog on embedding model support
- Voyage AI Documentation — Official Voyage AI documentation