Jina AI / jina-embeddings-v2-base-de
Milvus Integrated
Task: Embedding
Modality: Text
Similarity Metric: Any (Normalized)
License: Apache 2.0
Dimensions: 768
Max Input Tokens: 8192
Price: Free
Introduction to Jina Embedding v2 Models
Jina Embeddings v2 models are designed to handle long documents with an expanded max input size of 8,192 tokens. As of October 2024, Jina AI Embedding V2 has the following variants, each catering to different embedding needs.
What is jina-embeddings-v2-base-de
jina-embeddings-v2-base-de
is a bilingual (German/English) text embedding tool that can process up to 8192 tokens per sequence. It's built on a specialized BERT architecture (called JinaBERT) for monolingual and cross-lingual applications with mixed German-English input without bias.
Comparing jina-embeddings-v2-base-de
with other Jina embedding models.
Model | Parameter Size | Embedding Dimension | Text |
---|---|---|---|
jina-embeddings-v3 | 570M | flexible embedding size (Default: 1024) | multilingual text embeddings; supports 94 language in total |
jina-embeddings-v2-small-en | 33M | 512 | English monolingual embeddings |
jina-embeddings-v2-base-en | 137M | 768 | English monolingual embeddings |
jina-embeddings-v2-base-zh | 161M | 768 | Chinese-English Bilingual embeddings |
jina-embeddings-v2-base-de | 161M | 768 | German-English Bilingual embeddings |
jina-embeddings-v2-base-code | 161M | 768 | English and programming languages |
How to create embeddings using jina-embeddings-v2-base-de
There are two primary ways to generate vector embeddings:
- PyMilvus: the Python SDK for Milvus that seamlessly integrates the
jina-embeddings-v2-base-de
model. - SentenceTransformer library: the Python library
sentence-transformer
.
Once the vector embeddings are created, they can be stored in a vector database like Zilliz Cloud (a fully managed vector database powered by Milvus) and used for semantic similarity search.
Here are four key steps:
- Sign up for a Zilliz Cloud account for free.
- Set up a serverless cluster and obtain the Public Endpoint and API Key.
- Create a vector collection and insert your vector embeddings.
- Run a semantic search on the stored embeddings.
Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search
from pymilvus.model.dense import SentenceTransformerEmbeddingFunction
from pymilvus import MilvusClient
ef = SentenceTransformerEmbeddingFunction("jinaai/jina-embeddings-v2-base-de", trust_remote_code=True)
docs = [
"Die Künstliche Intelligenz wurde 1956 als akademische Disziplin gegründet.",
"Alan Turing war die erste Person, die wesentliche Forschung im Bereich der Künstlichen Intelligenz betrieb.",
"Geboren in Maida Vale, London, wuchs Turing in Südengland auf."
]
# Generate embeddings for documents
docs_embeddings = ef(docs)
queries = ["Wann wurde die Künstliche Intelligenz gegründet?",
"Wo wurde Alan Turing geboren?"]
# Generate embeddings for queries
query_embeddings = ef(queries)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=ef.dim,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For details, refer to our [PyMilvus Embedding Model documentation](For more information, refer to our PyMilvus Embedding Model documentation.).
Create embeddings via the SentenceTransformer library and insert them into Zilliz Cloud for semantic search
from sentence_transformers import SentenceTransformer
from pymilvus import MilvusClient
model = SentenceTransformer("jinaai/jina-embeddings-v2-base-de", trust_remote_code=True)
docs = [
"Die Künstliche Intelligenz wurde 1956 als akademische Disziplin gegründet.",
"Alan Turing war die erste Person, die wesentliche Forschung im Bereich der Künstlichen Intelligenz betrieb.",
"Geboren in Maida Vale, London, wuchs Turing in Südengland auf."
]
# Generate embeddings for documents
docs_embeddings = model.encode(docs, normalize_embeddings=True)
queries = ["Wann wurde die Künstliche Intelligenz gegründet?",
"Wo wurde Alan Turing geboren?"]
# Generate embeddings for queries
query_embeddings = model.encode(queries, normalize_embeddings=True)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=768,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
- Introduction to Jina Embedding v2 Models
- What is jina-embeddings-v2-base-de
- How to create embeddings using jina-embeddings-v2-base-de
Content
Seamless AI Workflows
From embeddings to scalable AI search—Zilliz Cloud lets you store, index, and retrieve embeddings with unmatched speed and efficiency.
Try Zilliz Cloud for Free