BAAI / bge-base-zh-v1.5
Milvus Integrated
Task: Embedding
Modality: Text
Similarity Metric: Any (Normalized)
License: Apache 2.0
Dimensions: 768
Max Input Tokens: 512
Price: Free
Introduction to bge-base-zh-v1.5
bge-base-zh-v1.5
is a BAAI general embedding (BGE) model that transforms Chinese text into a compact vector. It provides a more reasonable similarity distribution than BAAI's previous text embedding model versions.
Compare bge-base-zh-v1.5
with other popular BGE models:
Model | Dimensions | Max Tokens | C-MTEB avg |
---|---|---|---|
bge-large-zh-v1.5 | 1024 | 512 | 64.53 |
bge-large-en | 1024 | 512 | 64.20 |
bge-base-zh-v1.5 | 768 | 512 | 63.13 |
bge-base-en | 768 | 512 | 62.96 |
bge-small-zh-v1.5 | 384 | 512 | 57.82 |
bge-small-zh | 384 | 512 | 58.27 |
How to create embeddings with bge-base-zh-v1.5
There are two primary ways to create vector embeddings:
- PyMilvus: the Python SDK for Milvus that seamlessly integrates the
bge-base-zh-v1.5
. - FlagEmbedding: the official Python SDK offered by BAAI.
These methods allow developers to easily incorporate advanced text embedding capabilities into their applications.
Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:
- Sign up for a Zilliz Cloud account for free.
- Set up a serverless cluster and obtain the Public Endpoint and API Key.
- Create a vector collection and insert your vector embeddings.
- Run a semantic search on the stored embeddings.
Generate vector embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search
from pymilvus import model, MilvusClient
ef = model.dense.SentenceTransformerEmbeddingFunction(
model_name="BAAI/bge-base-zh-v1.5",
device="cpu",
query_instruction="为这个句子生成表示以用于检索相关文章:"
)
# Generate embeddings for documents
docs = [
"人工智能作为一门学术学科成立于1958年。"
"艾伦·图灵是第一个在人工智能领域进行实质性研究的人。"
"图灵出生于伦敦的梅达维尔,并在英格兰南部长大。"
]
docs_embeddings = ef.encode_documents(docs)
# Generate embeddings for queries
queries = ["人工智能是什么时候成立的?",
"艾伦·图灵出生在哪里?"]
query_embeddings = ef.encode_queries(queries)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=ef.dim,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to our PyMilvus Embedding Model documentation.
Generate vector embeddings via FlagEmbedding Python library and insert them into Zilliz Cloud for semantic search
from FlagEmbedding import FlagModel
from pymilvus import MilvusClient
model = FlagModel("BAAI/bge-base-zh-v1.5",
query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
use_fp16=False)
# Generate embeddings for documents
docs = [
"人工智能作为一门学术学科成立于1958年。"
"艾伦·图灵是第一个在人工智能领域进行实质性研究的人。"
"图灵出生于伦敦的梅达维尔,并在英格兰南部长大。"
]
docs_embeddings = model.encode(docs)
# Generate embeddings for queries
queries = ["人工智能是什么时候成立的?",
"艾伦·图灵出生在哪里?"]
query_embeddings = model.encode_queries(queries)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=768,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to the model page on HuggingFace.
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free