The guide to all-mpnet-base-v2

All models
Hugging Face / all-mpnet-base-v2

Hugging Face / all-mpnet-base-v2

AI Model Milvus Integrated

Task: Embedding

Modality: Text

Similarity Metric: Any (Normalized)

License: Apache 2.0

Dimensions: 768

Max Input Tokens: 384

Price: Free

Model Overview

The "all-mpnet-base-v2" is a sentence and short paragraph encoder that transforms input text into a 768-dimensional vector. It's a refined version of the microsoft/mpnet-base model, fine-tuned on a dataset of 1 billion sentence pairs using a contrastive learning objective. all-mpnet-base-v2 is perfect for tasks such as information retrieval, clustering, and sentence similarity.

For more details, check out this post: All-Mpnet-Base-V2: Enhancing Sentence Embedding with AI

How to create embeddings using all-mpnet-base-v2

There are two primary ways to generate vector embeddings:

PyMilvus: the Python SDK for Milvus that seamlessly integrates the all-mpnet-base-v2model.
SentenceTransformer library: the Python library sentence-transformer.

Once the vector embeddings are created, they can be stored in a vector database like Zilliz Cloud (a fully managed vector database powered by Milvus) and used for semantic similarity search.

Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus.model.dense import SentenceTransformerEmbeddingFunction
from pymilvus import MilvusClient

ef = SentenceTransformerEmbeddingFunction("sentence-transformers/all-mpnet-base-v2")

docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
]
# Generate embeddings for documents
docs_embeddings = ef(docs)

queries = ["When was artificial intelligence founded",
          "Where was Alan Turing born?"]
# Generate embeddings for queries
query_embeddings = ef(queries)

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=ef.dim,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])

Check out this documentation for more details about PyMilvus integration with all-mpnet-base-v2.

Create embeddings via the SentenceTransformer library and insert them into Zilliz Cloud for semantic search

from sentence_transformers import SentenceTransformer
from pymilvus import MilvusClient

model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")

docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
]
# Generate embeddings for documents
docs_embeddings = model.encode(docs, normalize_embeddings=True)


queries = ["When was artificial intelligence founded",
          "Where was Alan Turing born?"]
# Generate embeddings for queries
query_embeddings = model.encode(queries, normalize_embeddings=True)

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=768,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])

Model Overview

How to create embeddings using all-mpnet-base-v2

Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

Create embeddings via the SentenceTransformer library and insert them into Zilliz Cloud for semantic search

Further Reading

Content

Seamless AI Workflows

Share this article

Related Resources

Evaluating Your Embedding Model

Training Your Own Text Embedding Model

Build AI Apps with Retrieval Augmented Generation (RAG)

AI Assistant