The guide to Qwen3-Embedding-8B

All models
Alibaba / Qwen3-Embedding-8B

Alibaba / Qwen3-Embedding-8B

AI Model Milvus Integrated

Task: Embedding

Modality: Text

Similarity Metric: Cosine

License: Apache 2.0

Dimensions: 4096

Max Input Tokens: 32000

Price: Free

Introduction to Qwen3-Embedding-8B

The Qwen3-Embedding-8B model is Alibaba’s 8-billion-parameter text embedding model within the Qwen3 Embedding series. Built on the dense Qwen3 architecture, it supports a 32k context length and offers strong multilingual capabilities across 100+ human and programming languages, enabling effective performance in text retrieval, code search, and cross-lingual scenarios.

Qwen3-Embedding-8B delivers state-of-the-art performance in text embedding applications and currently ranks No.1 in the MTEB multilingual leaderboard (June 5, 2025) with a score of 70.58. It also offers flexibility through vector definitions across all dimensions and user-defined instructions, allowing developers to adapt the model to specific tasks, languages, or application requirements.

How to create embeddings with Qwen3-Embedding-8B

There are two primary ways to generate vector embeddings:

PyMilvus: the Python SDK for Milvus that seamlessly integrates the Qwen3-Embedding-8B model.
PAI-EAS: A managed service for deploying custom models like Qwen3-Embedding-8B (for advanced customization).

Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus.model.dense import SentenceTransformerEmbeddingFunction
from pymilvus import MilvusClient

# Load the Qwen3-Embedding-8B model
ef = SentenceTransformerEmbeddingFunction(
    "Qwen/Qwen3-Embedding-8B", trust_remote_code=True
)

docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

# Generate embeddings for documents
docs_embeddings = ef(docs)

queries = ["When was artificial intelligence founded", "Where was Alan Turing born?"]

# Generate embeddings for queries
query_embeddings = ef(queries)

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(uri=ZILLIZ_PUBLIC_ENDPOINT, token=ZILLIZ_API_KEY)

COLLECTION = "qwen3_embedding_8b_documents"

# Drop collection if it exists
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)

# Create collection with auto-detected dimension
client.create_collection(collection_name=COLLECTION, dimension=ef.dim, auto_id=True)

# Insert documents with embeddings
for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})

# Search for similar documents
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    # consistency_level="Strong",  # Strong consistency ensures accurate results but may increase latency
    output_fields=["text"],
    limit=2,
)

# Print search results
for i, query in enumerate(queries):
    print(f"\nQuery: {query}")
    for result in results[i]:
        print(f"  - {result['entity']['text']} (distance: {result['distance']:.4f})")