Google / EmbeddingGemma
Milvus Integrated
Task: Embedding
Modality: Text
Similarity Metric: Cosine, dot product
License: gemma
Dimensions: 768
Max Input Tokens: 2048
Price: Free
Introduction to EmbeddingGemma
The EmbeddingGemma model is a 308M-parameter multilingual text embedding model from Google, built on Gemma 3 (with T5Gemma initialization) and developed using the same research foundations behind the Gemini models. It is well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search.
EmbeddingGemm supports 100+ languages and offers flexible output dimensions (from 768 down to 128) vis Matryoshka Representation Learning (MRL). With a 2K token context window and a memory footprint of under 200MB when quantized, EmbeddingGemma runs efficiently on resource-limited hardware. It can be deployed in everyday devices, such as phones, laptops, and tablets, making advanced text embedding capabilities accessible in a wide range of settings.
How to create embeddings with EmbeddingGemma
There are two primary ways to generate vector embeddings:
- PyMilvus: the Python SDK for Milvus that seamlessly integrates the
EmbeddingGemmamodel. - SentenceTransformer library: the Python library
sentence-transformer.
Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:
- Sign up for a Zilliz Cloud account for free.
- Set up a serverless cluster and obtain the Public Endpoint and API Key.
- Create a vector collection and insert your vector embeddings.
- Run a semantic search on the stored embeddings.
Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search
from pymilvus.model.dense import SentenceTransformerEmbeddingFunction
from pymilvus import MilvusClient
# Load the Google EmbeddingGemma-300M model
ef = SentenceTransformerEmbeddingFunction(
"google/embeddinggemma-300m", trust_remote_code=True
)
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]
# Generate embeddings for documents
docs_embeddings = ef(docs)
queries = ["When was artificial intelligence founded", "Where was Alan Turing born?"]
# Generate embeddings for queries
query_embeddings = ef(queries)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(uri=ZILLIZ_PUBLIC_ENDPOINT, token=ZILLIZ_API_KEY)
COLLECTION = "embeddinggemma_300m_documents"
# Drop collection if it exists
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
# Create collection with auto-detected dimension
client.create_collection(collection_name=COLLECTION, dimension=ef.dim, auto_id=True)
# Insert documents with embeddings
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
# Search for similar documents
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
# consistency_level="Strong", # Strong consistency ensures accurate results but may increase latency
output_fields=["text"],
limit=2,
)
# Print search results
for i, query in enumerate(queries):
print(f"\nQuery: {query}")
for result in results[i]:
print(f" - {result['entity']['text']} (distance: {result['distance']:.4f})")
For more information, refer to our PyMilvus Embedding Model documentation.
Seamless AI Workflows
From embeddings to scalable AI search—Zilliz Cloud lets you store, index, and retrieve embeddings with unmatched speed and efficiency.
Try Zilliz Cloud for Free

