The guide to voyage-large-2

All models
Voyage AI / voyage-large-2

Voyage AI / voyage-large-2

AI Model Milvus Integrated

Task: Embedding

Modality: Text

Similarity Metric: Any (Normalized)

License: Proprietary

Dimensions: 1536

Max Input Tokens: 16000

Price: $ 0.12/1M tokens

Introduction to the voyage-large-2 model

voyage-large-2 is Voyage AI's general-purpose text embedding model optimized for retrieval quality (e.g., better than OpenAI V3 Large). It is also ideal for tasks like summarization, clustering, and classification.

Comparing voyage-large-2 with other popular embedding models by Voyage AI:


Model	Context Length (tokens)	Embedding Dimension	Description
voyage-large-2-instruct	16000	1024	Top of MTEB leaderboard. Instruction-tuned general-purpose embedding model optimized for clustering, classification, and retrieval.
voyage-multilingual-2	32000	1024	Optimized for multilingual retrieval and RAG.
voyage-code-2	16000	1536	Optimized for code retrieval (17% better than alternatives).
voyage-large-2	16000	1536	General-purpose embedding model that is optimized for retrieval quality (e.g., better than OpenAI V3 Large).
voyage-2	4000	1024	General-purpose embedding model optimized for a balance between cost, latency, and retrieval quality.

How to generate vector embeddings with voyage-large-2

PyMilvus: the Python SDK for Milvus that seamlessly integrates the voyage-large-2 model.
Voyage AI Python package: the Python SDK offered by Voyage AI.

Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Generate vector embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus import model, MilvusClient

ef = model.dense.VoyageEmbeddingFunction(
   model_name="voyage-large-2",
   api_key="your-voyage-api-key",
   )

# Generate embeddings for documents
docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
]

docs_embeddings = ef.encode_documents(docs)

# Generate embeddings for queries
queries = ["When was artificial intelligence founded",
          "Where was Alan Turing born?"]

query_embeddings = ef.encode_queries(queries)

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=ef.dim,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])

For more information, refer to our PyMilvus Embedding Model documentation.

Generate vector embeddings with Voyage AI Python package and insert them into Zilliz Cloud for semantic search

import voyageai
from pymilvus import MilvusClient

vo = voyageai.Client(api_key="your-voyage-api-key")

# Generate embeddings for documents
docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
]
doc_embeddings = vo.embed(docs, model="voyage-large-2", input_type="document").embeddings

# Generate embeddings for queries
queries = ["When was artificial intelligence founded",
          "Where was Alan Turing born?"]
query_embeddings = vo.embed(docs, model="voyage-large-2", input_type="query").embeddings

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=1536,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])