The guide to nomic-embed-text-v1

All models
Nomic / nomic-embed-text-v1

Nomic / nomic-embed-text-v1

AI Model Milvus Integrated

Task: Embedding

Modality: Text

Similarity Metric: Cosine

License: Apache 2.0

Dimensions: 768

Max Input Tokens: 8192

Price: Free

Introduction to nomic-embed-text-v1

The nomic-embed-text-v1 model is an open-source, open-data text-embedding model with open training code, designed to be fully reproducible and auditable. It supports an 8192-token context window and is specialized for retrieval, similarity, clustering, and classification, delivering strong performance on both short and long context tasks.

Nomic-embed-text-v1 is now multimodal through nomic-embed-vision-v1, which shares the same embedding space, so text embeddings can be used directly alongside image embeddings.

How to create embeddings with nomic-embed-text-v1

There are two primary ways to generate vector embeddings:

PyMilvus: the Python SDK for Milvus that seamlessly integrates the nomic-embed-text-v1 model.
The embed module in the Nomic Python SDK provides embedding functionality using the Nomic Embedding API.

Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus import MilvusClient
from nomic import embed

# Prepare documents
docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

# Generate embeddings for documents using nomic-embed-text-v1.5
docs_embeddings = embed.text(
    texts=docs, model="nomic-embed-text-v1.5", task_type="search_document"
)["embeddings"]

# Prepare queries
queries = ["When was artificial intelligence founded", "Where was Alan Turing born?"]

# Generate embeddings for queries
query_embeddings = embed.text(
    texts=queries, model="nomic-embed-text-v1.5", task_type="search_query"
)["embeddings"]

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(uri=ZILLIZ_PUBLIC_ENDPOINT, token=ZILLIZ_API_KEY)

COLLECTION = "nomic_v1_5_documents"

# Drop collection if it exists
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)

# Create collection with dimension 768 (nomic-embed-text-v1.5 output dimension)
client.create_collection(collection_name=COLLECTION, dimension=768, auto_id=True)

# Insert documents with embeddings
for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})

# Search for similar documents
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    # consistency_level="Strong",  # Strong consistency ensures accurate results but may increase latency
    output_fields=["text"],
    limit=2,
)

# Print search results
for i, query in enumerate(queries):
    print(f"\nQuery: {query}")
    for result in results[i]:
        print(f"  - {result['entity']['text']} (distance: {result['distance']:.4f})")