The guide to embed-multilingual-v3.0 model

All models
Cohere / embed-multilingual-v3.0

Cohere / embed-multilingual-v3.0

AI Model Milvus Integrated

Task: Embedding

Modality: Text

Similarity Metric: Any (Normalized)

License: Proprietary

Dimensions: 1024

Max Input Tokens: 512

Price: $0.10 / 1M tokens

Introduction to embed-multilingual-v3.0

embed-multilingual-v3.0 is a high-performance embedding model tailored for multilingual text and is a member of Cohere's newly released Embed V3 model family. It supports 100+ languages and can be used to search within a language (e.g., search with a French query on French documents) and across languages (e.g., search with a Chinese query on Finnish documents). It is ideal for multilingual semantic search, retrieval augmented generation (RAG), text classification, and document clustering.

Comparing all embedding models within the Embed V3 model series.


Model Name	Dimensions	MTEB Performance (higher is better)	BEIR Performance (higher is better)
embed-english-v3.0	1024	64.5	55.9
embed-english-light-3.0	384	62.0	52.0
embed-multilingual-v3.0	1024	64.0	54.6
embed-multilingual-light-v3.0	384	60.1	50.9
embed-multilingual-v2.0	768	58.5	47.1

MTEB: Broad dataset for evaluating retrievals, classification, and clustering (56 datasets)
BEIR: Dataset focused on out-of-domain retrievals (14 datasets)

How to create vector embeddings with embed-multilingual-v3.0

There are two primary ways to create vector embeddings:

PyMilvus: the Python SDK for Milvus that seamlessly integrates the embed-multilingual-v3.0 model.
Cohere python SDK: the python SDK offered by Cohere.

Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Generate vector embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus.model.dense import CohereEmbeddingFunction

COHERE_API_KEY = "your-cohere-api-key"
ef = CohereEmbeddingFunction("embed-multilingual-v3.0", api_key=COHERE_API_KEY)

docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
]

# Generate embeddings for documents
docs_embeddings = ef.encode_documents(docs)

queries = ["When was artificial intelligence founded",
          "Where was Alan Turing born?"]

# Generate embeddings for queries
query_embeddings = ef.encode_queries(queries)

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=ef.dim,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])

For more information, refer to our PyMilvus Embedding Model documentation.

Generate vector embeddings via Cohere Python SDK and insert them into Zilliz Cloud for semantic search

import cohere
from pymilvus import MilvusClient

COHERE_API_KEY = "your-cohere-api-key"
co = cohere.Client(COHERE_API_KEY)

docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
]

docs_embeddings = co.embed(
    texts=docs, model="embed-multilingual-v3.0", input_type="search_document"
).embeddings

queries = ["When was artificial intelligence founded",
          "Where was Alan Turing born?"]

query_embeddings = co.embed(
    texts=docs, model="embed-english-v3.0", input_type="search_query"
).embeddings

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=1024,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])