Voyage AI / voyage-code-2
Milvus Integrated
Task: Embedding
Modality: Text
Similarity Metric: Any (Normalized)
License: Proprietary
Dimensions: 1536
Max Input Tokens: 16000
Price: $ 0.12/1M tokens
Introduction to voyage-code-2
voyage-code-2
is Voyage AI's text embedding model optimized for code retrieval (17% better than alternatives).
Comparing voyage-code-2
with other popular embedding models by Voyage AI:
Model | Context Length (tokens) | Embedding Dimension | Description |
voyage-large-2-instruct | 16000 | 1024 | Top of MTEB leaderboard. Instruction-tuned general-purpose embedding model optimized for clustering, classification, and retrieval. |
voyage-multilingual-2 | 32000 | 1024 | Optimized for multilingual retrieval and RAG. |
voyage-code-2 | 16000 | 1536 | Optimized for code retrieval (17% better than alternatives). |
voyage-large-2 | 16000 | 1536 | General-purpose embedding model that is optimized for retrieval quality (e.g., better than OpenAI V3 Large). |
voyage-2 | 4000 | 1024 | General-purpose embedding model optimized for balancing cost, latency, and retrieval quality. |
How to create embeddings with voyage-code-2
There are two primary ways to create vector embeddings:
- PyMilvus: the Python SDK for Milvus that seamlessly integrates the
voyage-code-2
model. - Voyage AI Embedding: the Python SDK offered by Voyage AI.
Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:
- Sign up for a Zilliz Cloud account for free.
- Set up a serverless cluster and obtain the Public Endpoint and API Key.
- Create a vector collection and insert your vector embeddings.
- Run a semantic search on the stored embeddings.
Generate vector embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search
from pymilvus import model, MilvusClient
ef = model.dense.VoyageEmbeddingFunction(
model_name="voyage-code-2",
api_key="your-voyage-api-key",
)
# Generate embeddings for documents
docs = [
"retriever = KNNRetriever.from_texts(documents, embeddings)",
"knn = KNeighborsClassifier(n_neighbors=3)",
"sorted_numbers = sorted(numbers)",
"def dynamic_programming(): print('yes')",
"documents_embds = get_embeddings(documents)",
"response = client.embeddings.create(input = documents, model='text-embedding-ada-002')"
]
docs_embeddings = ef.encode_documents(docs)
# Generate embeddings for queries
queries = ["Is the function dynamic_programming() implemented using dynamic programming?"]
query_embeddings = ef.encode_queries(queries)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=ef.dim,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to our PyMilvus Embedding Model documentation.
Generate vector embeddings via Voyage AI 's Python SDK and insert them into Zilliz Cloud for semantic search
import voyageai
from pymilvus import MilvusClient
vo = voyageai.Client(api_key="your-voyage-api-key")
# Generate embeddings for documents
docs = [
"retriever = KNNRetriever.from_texts(documents, embeddings)",
"knn = KNeighborsClassifier(n_neighbors=3)",
"sorted_numbers = sorted(numbers)",
"def dynamic_programming(): print('yes')",
"documents_embds = get_embeddings(documents)",
"response = client.embeddings.create(input = documents, model='text-embedding-ada-002')"
]
docs_embeddings = vo.embed(docs, model="voyage-code-2", input_type="document").embeddings
# Generate embeddings for queries
queries = ["Is the function dynamic_programming() implemented using dynamic programming?"]
query_embeddings = vo.embed(queries, model="voyage-code-2", input_type="query").embeddings
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=1536,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to Voyage AI Embedding Guide.
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free