Jina AI / jina-embeddings-v2-base-en
Milvus Integrated
Task: Embedding
Modality: Text
Similarity Metric: Any (Normalized)
License: Apache 2.0
Dimensions: 768
Max Input Tokens: 8192
Price: Free
Introduction to Jina Embedding v2 Models
Jina Embeddings v2 models are designed to handle long documents with an expanded max input size of 8,192 tokens — more than sixteen times as much as Jina Embeddings v1 and the widely used SBERT models! As of August 2024, Jina AI Embedding V2 has four variants, each catering to different embedding needs:
- jina-embeddings-v2-small-en
- jina-embeddings-v2-base-en
- jina-embeddings-v2-base-zh
- jina-embeddings-v2-base-de
Introduction to jina-embeddings-v2-base-en
jina-embeddings-v2-base-en is an English monolingual embedding model for a sequence length of up to 8192 tokens. It is the medium-sized or basic variant in the Jina Embeddings v2 family, which has been trained with 137 million parameters and generates 768-dimensional embeddings.
Comparing jina-embeddings-v2-small-en with other Jina embedding models.
Model | Parameter Size | Embedding Dimension | Text |
---|---|---|---|
jina-embeddings-v2-small-en | 33M | 512 | English monolingual embeddings |
jina-embeddings-v2-base-en | 137M | 768 | English monolingual embeddings |
jina-embeddings-v2-base-zh | 161M | 768 | Chinese-English Bilingual embeddings |
jina-embeddings-v2-base-de | 161M | 768 | German-English Bilingual embeddings |
How to create embeddings with jina-embeddings-v2-base-en
There are two primary ways to use the jina-embeddings-v2-base-en
model to generate vector embeddings:
- PyMilvus: the Python SDK for Milvus that seamlessly integrates the
jina-embeddings-v2-base-en
model. - SentenceTransformer library: the python library
sentence-transformer
.
Generate vector embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search
from pymilvus.model.dense import SentenceTransformerEmbeddingFunction
from pymilvus import MilvusClient
ef = SentenceTransformerEmbeddingFunction("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England."
]
# Generate embeddings for documents
docs_embeddings = ef(docs)
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]
# Generate embeddings for queries
query_embeddings = ef(queries)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=ef.dim,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to our PyMilvus Embedding Model documentation.
Generate vector embeddings via SentenceTransformer and insert them into Zilliz Cloud for semantic search
from sentence_transformers import SentenceTransformer
from pymilvus import MilvusClient
model = SentenceTransformer("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England."
]
# Generate embeddings for documents
docs_embeddings = model.encode(docs, normalize_embeddings=True)
queries = ["query: When was artificial intelligence founded",
"query: Wo wurde Alan Turing geboren?" ]
# Generate embeddings for queries
query_embeddings = model.encode(queries, normalize_embeddings=True)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=768,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to SentenceTransformer documentation.
- Introduction to Jina Embedding v2 Models
- Introduction to jina-embeddings-v2-base-en
- How to create embeddings with jina-embeddings-v2-base-en
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free