The guide to nomic-embed-text-v1.5

All models
Nomic / nomic-embed-text-v1.5

Nomic / nomic-embed-text-v1.5

AI Model Milvus Integrated

Task: Embedding

Modality: Text

Similarity Metric: Cosine

License: Apache 2.0

Dimensions: 768

Max Input Tokens: 8192

Price: Free

Introduction to nomic-embed-text-v1.5

The nomic-embed-text-v1.5 model is a text-embedding model with an 8192-token input window and support for variable embedding sizes. Trained with Matryoshka Representation Learning, it allows developers to generate compact or higher-capacity embeddings from the same model, with recommended sizes of 768, 512, 256, 128, and 64.

Nomic-embed-text-v1.5 is now multimodal through nomic-embed-vision-v1.5, which shares the same embedding space, so text embeddings can be used directly alongside image embeddings.

How to create embeddings with nomic-embed-text-v1.5

There are two primary ways to generate vector embeddings:

PyMilvus: the Python SDK for Milvus that seamlessly integrates the nomic-embed-text-v1.5 model.
The embed module in the Nomic Python SDK provides embedding functionality using the Nomic Embedding API.

Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus import MilvusClient
from nomic import embed

# Prepare documents
docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

# Generate embeddings for documents using nomic-embed-text-v1.5
docs_embeddings = embed.text(
    texts=docs, model="nomic-embed-text-v1.5", task_type="search_document"
)["embeddings"]

# Prepare queries
queries = ["When was artificial intelligence founded", "Where was Alan Turing born?"]

# Generate embeddings for queries
query_embeddings = embed.text(
    texts=queries, model="nomic-embed-text-v1.5", task_type="search_query"
)["embeddings"]

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(uri=ZILLIZ_PUBLIC_ENDPOINT, token=ZILLIZ_API_KEY)

COLLECTION = "nomic_v1_5_documents"

# Drop collection if it exists
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)

# Create collection with dimension 768 (nomic-embed-text-v1.5 output dimension)
client.create_collection(collection_name=COLLECTION, dimension=768, auto_id=True)

# Insert documents with embeddings
for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})

# Search for similar documents
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    # consistency_level="Strong",  # Strong consistency ensures accurate results but may increase latency
    output_fields=["text"],
    limit=2,
)

# Print search results
for i, query in enumerate(queries):
    print(f"\nQuery: {query}")
    for result in results[i]:
        print(f"  - {result['entity']['text']} (distance: {result['distance']:.4f})")