Jina AI / jina-embeddings-v3
Milvus Integrated
Task: Embedding
Modality: Text
Similarity Metric: Any (Normalized)
License: CC BY-NC 4.0
Dimensions: 1024
Max Input Tokens: 8192
Price:
Introduction to jina-embeddings-v3
The jina-embeddings-v3 model is JinaAI's newly released multilingual text embedding tool with 570 million parameters and a maximum input length of 8192 tokens. It can handle multilingual data processing and long-context retrieval tasks, achieving state-of-the-art (SOTA) performance across 94 languages. This model creates embeddings suited for a range of tasks, including query-document retrieval, clustering, classification, and text matching.
Jina-embeddings-v3 also supports Matryoshka Embeddings, which lets you customize the output embedding size based on your needs. While the default output dimension is 1024, you can reduce it to 32, 64, 128, 256, 512, or 768 without losing too much performance, making it adaptable for various applications.
Compare jina-embeddings-v3 with Jina v2 models:
Model | Parameter Size | Embedding Dimension | Text |
---|---|---|---|
jina-embeddings-v3 | 570M | flexible embedding size (Default: 1024) | multilingual text embeddings; supports 94 language in total |
jina-embeddings-v2-small-en | 33M | 512 | English monolingual embeddings |
jina-embeddings-v2-base-en | 137M | 768 | English monolingual embeddings |
jina-embeddings-v2-base-zh | 161M | 768 | Chinese-English Bilingual embeddings |
jina-embeddings-v2-base-de | 161M | 768 | German-English Bilingual embeddings |
jina-embeddings-v2-base-code | 161M | 768 | English and programming languages |
How to create embeddings with jina-embeddings-v3
There are two primary ways to generate vector embeddings:
- PyMilvus: the Python SDK for Milvus that seamlessly integrates the
jina-embeddings-v3
model. - SentenceTransformer library: the Python library
sentence-transformer
.
Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:
- Sign up for a Zilliz Cloud account for free.
- Set up a serverless cluster and obtain the Public Endpoint and API Key.
- Create a vector collection and insert your vector embeddings.
- Run a semantic search on the stored embeddings.
Create embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search
A step-by-step guide coming soon.
For more details, check out this Jina AI documentation page.
Further Reading
- Training Text Embeddings with Jina AI
- General Text-Image Representation Learning for Search and Multimodal RAG
- Choosing the Right Embedding Model for Your Data
- Evaluating Your Embedding Model
- Training Your Own Text Embedding Model
- A Beginner's Guide to Website Chunking and Embedding for Your RAG Applications
- What is RAG?
- Introduction to jina-embeddings-v3
- How to create embeddings with jina-embeddings-v3
- Further Reading
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free