Hugging Face and Zilliz Cloud Integration
Hugging Face and Zilliz Cloud integrate to power AI applications with cutting-edge embedding models and scalable vector storage, combining Hugging Face's open-source library of 728,000+ models and 160,000+ datasets with Zilliz Cloud's high-performance vector database for semantic search, RAG, and recommendation systems.
Use this integration for FreeWhat is Hugging Face
Hugging Face is a comprehensive AI/ML ecosystem serving as a collaborative platform for the AI community. It offers an open-source library of 728,000+ models and 160,000+ datasets, along with tools for model training, fine-tuning, and deployment. The platform provides pre-trained models across NLP, computer vision, and more, with the MTEB (Massive Text Embedding Benchmark) leaderboard helping developers select the best embedding models by rank, retrieval performance, token length, and embedding dimension.
By integrating with Zilliz Cloud (fully managed Milvus), Hugging Face's cutting-edge embedding models are paired with a high-performance vector database, enabling seamless conversion of unstructured data into vector embeddings stored and searched at scale for semantic similarity search, RAG, anomaly detection, and recommender systems.
Benefits of the Hugging Face + Zilliz Cloud Integration
- 728,000+ models at your fingertips: Hugging Face's vast model library gives developers flexibility to choose the best embedding model for their use case, with Zilliz Cloud efficiently storing and searching the resulting vectors regardless of model choice.
- MTEB leaderboard for model selection: The MTEB leaderboard helps developers filter and select optimal embedding models by retrieval performance, token length, and dimension, ensuring the best match for Zilliz Cloud-powered applications.
- Streamlined workflow: From model selection on Hugging Face to embedding generation with Transformers to vector storage in Zilliz Cloud, the integration provides a streamlined path from data to production-ready AI applications.
- Flexible embedding dimensions: Hugging Face models produce embeddings of various dimensions (e.g., 384 for all-MiniLM-L6-v2), and Zilliz Cloud handles any dimension efficiently with automatic indexing and similarity search.
How the Integration Works
Hugging Face serves as the model and data layer, providing pre-trained embedding models (e.g., all-MiniLM-L6-v2) via the Transformers library and datasets via the Datasets library. It handles tokenization, model inference, and embedding generation, converting text data into high-dimensional vector representations.
Zilliz Cloud serves as the vector database layer, storing and indexing the embeddings generated by Hugging Face models. It provides high-performance similarity search with support for auto IDs, dynamic fields, and multiple consistency levels, enabling fast retrieval of the most relevant results.
Together, Hugging Face and Zilliz Cloud create a complete semantic search solution: datasets are loaded from Hugging Face, text is encoded into embeddings using pre-trained models, and the vectors are stored in Zilliz Cloud. When a user submits a query, it is embedded using the same model and Zilliz Cloud performs similarity search to find the closest matching items — enabling applications like question answering, recommendations, and RAG.
Step-by-Step Guide
1. Install Required Packages
$ pip install --upgrade pymilvus transformers datasets torch2. Load Data from Hugging Face Datasets
Load example question-answer pairs from the SQuAD dataset:
from datasets import load_dataset DATASET = "squad" INSERT_RATIO = 0.001 data = load_dataset(DATASET, split="validation") data = data.train_test_split(test_size=INSERT_RATIO, seed=42)["test"] data = data.map( lambda val: {"answer": val["answers"]["text"][0]}, remove_columns=["id", "answers", "context"], ) print(data)3. Generate Embeddings with Hugging Face Models
Select a text embedding model and define the encoding function:
from transformers import AutoTokenizer, AutoModel import torch MODEL = "sentence-transformers/all-MiniLM-L6-v2" INFERENCE_BATCH_SIZE = 64 tokenizer = AutoTokenizer.from_pretrained(MODEL) model = AutoModel.from_pretrained(MODEL) def encode_text(batch): encoded_input = tokenizer( batch["question"], padding=True, truncation=True, return_tensors="pt" ) with torch.no_grad(): model_output = model(**encoded_input) token_embeddings = model_output[0] attention_mask = encoded_input["attention_mask"] input_mask_expanded = ( attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() ) sentence_embeddings = torch.sum( token_embeddings * input_mask_expanded, 1 ) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) batch["question_embedding"] = torch.nn.functional.normalize( sentence_embeddings, p=2, dim=1 ) return batch data = data.map(encode_text, batched=True, batch_size=INFERENCE_BATCH_SIZE) data_list = data.to_list()4. Insert Data into Milvus
Connect to Milvus, create a collection, and insert the data:
from pymilvus import MilvusClient MILVUS_URI = "./huggingface_milvus_test.db" COLLECTION_NAME = "huggingface_test" DIMENSION = 384 milvus_client = MilvusClient(MILVUS_URI) if milvus_client.has_collection(collection_name=COLLECTION_NAME): milvus_client.drop_collection(collection_name=COLLECTION_NAME) milvus_client.create_collection( collection_name=COLLECTION_NAME, dimension=DIMENSION, auto_id=True, enable_dynamic_field=True, vector_field_name="question_embedding", consistency_level="Strong", ) milvus_client.insert(collection_name=COLLECTION_NAME, data=data_list)As for the argument of
MilvusClient: Setting theurias a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file. If you have large scale of data, you can set up a more performant Milvus server on Docker or Kubernetes. If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust theuriandtoken, which correspond to the Public Endpoint and API Key in Zilliz Cloud.5. Ask Questions and Search
Generate question embeddings and search across Milvus:
questions = { "question": [ "What is LGM?", "When did Massachusetts first mandate that children be educated in schools?", ] } question_embeddings = [v.tolist() for v in encode_text(questions)["question_embedding"]] search_results = milvus_client.search( collection_name=COLLECTION_NAME, data=question_embeddings, limit=3, output_fields=["answer", "question"], ) for q, res in zip(questions["question"], search_results): print("Question:", q) for r in res: print( { "answer": r["entity"]["answer"], "score": r["distance"], "original question": r["entity"]["question"], } ) print("\n")Learn More
- Question Answering Using Milvus and Hugging Face — Official Milvus tutorial for QA with Hugging Face
- Effortless AI Workflows: A Beginner's Guide to Hugging Face and PyMilvus — Zilliz tutorial on Hugging Face and PyMilvus
- Scaling Search with Milvus: Handling Massive Datasets with Ease — Zilliz blog on scaling search with Hugging Face datasets
- Hugging Face Models — Hugging Face model hub
- MTEB Leaderboard — Massive Text Embedding Benchmark leaderboard