The guide to EmbeddingGemma

All models
Google / EmbeddingGemma

Google / EmbeddingGemma

AI Model Milvus Integrated

Task: Embedding

Modality: Text

Similarity Metric: Cosine, dot product

License: gemma

Dimensions: 768

Max Input Tokens: 2048

Price: Free

Introduction to EmbeddingGemma

The EmbeddingGemma model is a 308M-parameter multilingual text embedding model from Google, built on Gemma 3 (with T5Gemma initialization) and developed using the same research foundations behind the Gemini models. It is well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search.

EmbeddingGemm supports 100+ languages and offers flexible output dimensions (from 768 down to 128) vis Matryoshka Representation Learning (MRL). With a 2K token context window and a memory footprint of under 200MB when quantized, EmbeddingGemma runs efficiently on resource-limited hardware. It can be deployed in everyday devices, such as phones, laptops, and tablets, making advanced text embedding capabilities accessible in a wide range of settings.

How to create embeddings with EmbeddingGemma

There are two primary ways to generate vector embeddings:

PyMilvus: the Python SDK for Milvus that seamlessly integrates the EmbeddingGemma model.
SentenceTransformer library: the Python library sentence-transformer.

Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus.model.dense import SentenceTransformerEmbeddingFunction
from pymilvus import MilvusClient

# Load the Google EmbeddingGemma-300M model
ef = SentenceTransformerEmbeddingFunction(
    "google/embeddinggemma-300m", trust_remote_code=True
)

docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

# Generate embeddings for documents
docs_embeddings = ef(docs)

queries = ["When was artificial intelligence founded", "Where was Alan Turing born?"]

# Generate embeddings for queries
query_embeddings = ef(queries)

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(uri=ZILLIZ_PUBLIC_ENDPOINT, token=ZILLIZ_API_KEY)

COLLECTION = "embeddinggemma_300m_documents"

# Drop collection if it exists
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)

# Create collection with auto-detected dimension
client.create_collection(collection_name=COLLECTION, dimension=ef.dim, auto_id=True)

# Insert documents with embeddings
for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})

# Search for similar documents
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    # consistency_level="Strong",  # Strong consistency ensures accurate results but may increase latency
    output_fields=["text"],
    limit=2,
)

# Print search results
for i, query in enumerate(queries):
    print(f"\nQuery: {query}")
    for result in results[i]:
        print(f"  - {result['entity']['text']} (distance: {result['distance']:.4f})")