An Introduction to Vector Embeddings: What They Are and How to Use Them
In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.
Read the entire series
- Exploring BGE-M3 and Splade: Two Machine Learning Models for Generating Sparse Embeddings
- Comparing SPLADE Sparse Vectors with BM25
- Exploring ColBERT: A Token-Level Embedding and Ranking Model for Efficient Similarity Search
- Vectorizing and Querying EPUB Content with the Unstructured and Milvus
- What Are Binary Embeddings?
- A Beginner's Guide to Website Chunking and Embedding for Your RAG Applications
- An Introduction to Vector Embeddings: What They Are and How to Use Them
- Image Embeddings for Enhanced Image Search: An In-depth Explainer
- A Beginner’s Guide to Using OpenAI Text Embedding Models
- DistilBERT: A Distilled Version of BERT
- Unlocking the Power of Vector Quantization: Techniques for Efficient Data Compression and Retrieval
Understand vector embeddings and when and how to use them. Explore real-world applications with Milvus and Zilliz Cloud vector databases.
Vector embeddings are numerical representations of data points, making unstructured data easier to search against. These embeddings are stored in specialized databases like Milvus and Zilliz Cloud (fully managed Milvus), which utilize advanced algorithms and indexing techniques for quick data retrieval.
Modern artificial intelligence (AI) models, like Large Language Models (LLMs), use text vector embeddings to understand natural language and generate relevant responses. Moreover, advanced versions of LLMs use Retrieval Augmented Generation (RAG) to retrieve information from external vector stores for task-specific applications.
In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.
What are Vector Embeddings?
A vector embedding is a list of numerical data points, with each number representing a data feature. These embeddings are obtained by analyzing connections within a dataset. Data points that are closer to each other are identified as semantically similar.
The embeddings are formulated using deep learning models trained to map data to a high-dimensional vector space. Popular embedding models like BERT and Data2Vec form the basis of many modern deep-learning applications.
Moreover, vector embeddings are popularly used in NLP and CV applications due to their efficiency.
Types of Vector Embeddings
There are three main types of embeddings based on their dimensionality: dense, sparse, and binary embeddings. Here’s how they differ in characteristics and use:
1. Dense Embeddings
Vector embeddings that represent data points with most non-zero elements are dense. They capture finer details since they store all data, even zero values, making them less storage efficient.
Word2Vec, GloVe, CLIP, and BERT are models that generate dense vector embeddings from input data.
2. Sparse Embeddings
Sparse vector embeddings are high-dimensional vectors with most zero vector elements. The non-zero values in sparse embeddings represent the relative importance of data points in a corpus. Sparse embeddings require less memory and storage and are suitable for high-dimensional sparse data like word frequency.
TF-IDF and SPLADE are popular methods of generating sparse vector embeddings.
3. Binary Embeddings
A binary embedding stores information in only 2 bits, 1 and 0. This form of storage is substantially more efficient than 32-bit floating point integers and improves data retrieval. However, it does lead to information loss since we are dialing down on data precision.
Regardless, binary embeddings are popular in certain use cases where speed is preferred for slight accuracy.
How are Vector Embeddings Created?
Sophisticated deep learning models and statistical methods help create vector embeddings. These models identify patterns and connections in input data to learn the difference between data points. Models generate vector embeddings in an n-dimensional space based on their understanding of underlying connections.
An N-dimensional space is beyond our 3-dimensional thinking and captures data from multiple perspectives. High-dimensional vector embeddings allow capturing finer details from data points, resulting in accurate outputs.
For example, in textual data, high-dimensional space allows for capturing subtle differences in word meanings. Operating in a 2-dimensional space will group the words “tired” and “exhausted” together. An n-dimensional space will project them in different dimensions, capturing the difference in emotions. Mathematically, the following vector is a vector v in n-dimensional space:
v=[v1,v2,…,vn]
The two popular techniques for creating vector embeddings are:
Neural Networks
Neural networks, such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), excel at learning data complexities. For example, BERT analyzes a word's neighboring terms to understand its meaning and generate embeddings.
Matrix Factorization
Unlike neural networks, matrix factorization is a simpler embedding model. It takes training data as a matrix where each row and column represents a data record. The model then factorizes data points into lower-rank matrices. Matrix factorization is popularly used in recommendation systems, where the input matrix is the user rating matrix with rows representing users and columns representing the item (e.g., movie). Multiplying the user embedding matrix with the transpose of the item embedding matrix generates a matrix that approximates the original matrix.
Various tools and libraries simplify the process of generating embeddings from input data. The most popular libraries include TensorFlow, PyTorch, and Hugging Face. These open-source libraries and tools offer user-friendly documentation for creating embedding models.
The following table lists different embedding models, their descriptions, and links to the official documentation:
| Model | Description | Link | 
| Neural Networks | Neural Networks like CNNs and RNNs effectively identify data patterns, which is useful for generating vector embeddings. For example, Word2Vec. | https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/video-lecture | 
| Matrix Factorization | Matrix Factorization is suitable for filtering tasks like recommendation systems. It captures user preferences by manipulating input matrices. | https://developers.google.com/machine-learning/recommendation/collaborative/matrix | 
| GloVe | GloVe is a uni-directional embedding model. It generates a single-word embedding for a single word. | https://nlp.stanford.edu/projects/glove/ | 
| BERT | BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained model that analyzes textual data bidirectionaly. | https://zilliz.com/learn/bge-m3-and-splade-two-machine-learning-models-for-generating-sparse-embeddings#BERT-The-Foundation-Model-for-BGE-M3-and-Splade | 
| ColBERT | A token-level embedding and ranking model | https://zilliz.com/learn/explore-colbert-token-level-embedding-and-ranking-model-for-similarity-search | 
| SPLADE | An advanced embedding model for generating sparse embeddings. | https://zilliz.com/learn/bge-m3-and-splade-two-machine-learning-models-for-generating-sparse-embeddings#SPLADE | 
| BGE-M3 | BGE-M3 is an advanced machine-learning model that extends BERT's capabilities. | https://zilliz.com/learn/bge-m3-and-splade-two-machine-learning-models-for-generating-sparse-embeddings#BGE-M3 | 
What are Vector Embeddings Used for?
Vector embeddings are widely used in various modern search and AI tasks. Some of these tasks include:
- Similarity Search: Similarity search is a technique to find similar data points in high-dimensional space. This is done by measuring the distance between vector embeddings using similarity measures like Euclidean distance or Jaccard similarity. Modern search engines use similarity search to retrieve relevant web pages against user searches.
- Recommendation Systems: Recommendation systems rely on vectorized data to cluster similar items. Elements from the same cluster are then used as recommendations for the users. The systems create clusters on various levels, such as groups of users based on demographics and preferences and a group of products. All this information is stored as vector embeddings for efficient and accurate retrieval at runtime.
- Retrieval Augmented Generation (RAG): RAG is a popular technique for alleviating the hallucinatory issues of large language models and providing them with additional knowledge. Embedding models transform external knowledge and user queries into vector embeddings. A vector database stores the embeddings and conducts a similarity search for the most relevant results to the user query. The LLM generates the final answers based on the retrieved contextual information.
Storing, Indexing, and Retrieving Vector Embeddings with Milvus
Milvus offers a built-in library to store, index, and search vector embeddings. Here’s the step-by-step approach to do so using the PyMilvus  library:
1. Install Libraries and Set up a Milvus Database
Install pymilvus , and gensim , where Pymilvus  is a Python SDK for Milvus, and gensim  is a Python library for NLP. Run the following code to install the libraries:
!pip install -U -pymilvus gensim
In this tutorial, we’re connecting Milvus using docker, so make sure you’ve docker installed in your system. Run the following command in your terminal to install Milvus:
> wget -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh
> bash standalone_embed.sh start
Now the Milvus service has started and you’re ready to use the Milvus database.To set up a local Milvus vector database, create a MilvusClient instance and specify a filename, like milvus_demo.db, to store all the data.
from pymilvus import MilvusClient
client = MilvusClient("milvus_demo.db")
2. Generate Vector Embeddings
The following code creates a collection to store embeddings, loads a pre-trained model from gensim , and generates embeddings to simple words like ice and water:
import gensim.downloader as api
from pymilvus import (   connections,   FieldSchema,   CollectionSchema,   DataType)
# create a collection
fields = [   FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),   FieldSchema(name="words", dtype=DataType.VARCHAR, max_length=50),   FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=50)]
schema = CollectionSchema(fields, "Demo to store and retrieve embeddings")
demo_milvus = client.create_collection("milvus_demo", schema)
# load the pre-trained model from gensim
model = api.load("glove-wiki-gigaword-50")
# generate embeddings
ice = model['ice']
water = model['water']
cold = model['cold']
tree = model['tree']
man = model['man']
woman = model['woman']
child = model['child']
female = model['female']
3. Store Vector Embeddings
Store the generated vector embeddings in the previous step to the demo_milvus  collection we created above:
#Insert data in collection
data = [   [1,2,3,4,5,6,7,8],  # field pk  
 ['ice','water','cold','tree','man','woman','child','female'],  # field words   
[ice, water, cold, tree, man, woman, child, female],  # field embeddings]
insert_result = demo_milvus.insert(data)
# After final entity is inserted, it is best to call flush to have no growing segments left in memory
demo_milvus.flush()
4. Create Indexes on Entries
Indexes make the vector search faster. The following code IVF_FLAT  index, L2 (Euclidean distance) metric, and 128 parameters to create an index:
index = {   "index_type": "IVF_FLAT",   "metric_type": "L2",   "params": {"nlist": 128},}
demo_milvus.create_index("embeddings", index)
5. Search Vector Embeddings
To search the vector embedding, load the Milvus collection in memory using the .load()  method and do a vector similarity search:
demo_milvus.load()
# performs a vector similarity search:
data = [cold]search_params = {   "metric_type": "L2",   "params": {"nprobe": 10},}
result = demo_milvus.search(data, "embeddings", search_params, limit=4, output_fields=["words"])
Best Practices for Using Vector Embeddings
Obtaining optimal results with vector embeddings requires careful use of embedding models. The best practices for using vector embeddings are:
1. Selecting the Right Embedding Model
Different embedding models are suitable for different tasks. For example, CLIP is designed for multimodal tasks, and GloVe is designed for NLP tasks. Selecting embedding models based on data needs and computational limitations results in better outputs.
2. Optimizing Embedding Performance
Pre-trained models like BERT and CLIP offer a good starting point. However, these can be optimized for improved performance.
Hyperparameter tuning also helps find the important combination of features for optimal performance. Data augmentation is another way to improve embedding model performance. It artificially increases the size and complexity of data, making it suitable for tasks with limited data.
3. Monitoring Embedding Model
Continuous monitoring of embedding models tests their performance over time. This offers insights into model degradation, allowing fine-tuning them for accurate results.
4. Considering Evolving Needs
Evolving data needs like growing data or changing format may decrease accuracy. Retraining and fine-tuning models according to data needs ensures precise model performance.
Common Pitfalls and How to Avoid Them
Change in Model Architecture
Fine-tuning and hyperparameter tuning can modify the underlying model architecture. Since the model generates vector embeddings, significant changes can lead to different vector embeddings.
To improve model performance without changing them completely, avoid adjusting model parameters completely. Instead, fine-tune pre-trained models like Word2Vec and BERT for specific tasks.
Data Drift
Data drift happens when data changes from what the model was trained on. This might result in inaccurate vector embeddings. Continuous monitoring of data ensures it stays consistent with model requirements.
Misleading Evaluation Metrics
All evaluation metrics are suitable for different tasks. Randomly choosing the evaluation metrics might result in misleading analysis, hiding the model's true performance.
Carefully pick the evaluation metrics suitable for your tasks. For example, Cosine similarity for semantic differences and BLEU score for translation tasks.
Further Resources
The best way to build a deeper understanding of vector embeddings is by watching relevant resources, practicing, and engaging with industry professionals. Below are the ways you can deeply explore vector embeddings:
- Zilliz Learn series: https://zilliz.com/learn
- Zilliz Glossaries: https://zilliz.com/glossary
- Embedding models and their integration with Milvus: https://zilliz.com/product/integrations
- Hugging Face Models: https://huggingface.co/models
- Academic papers:
- Benchmark tool for evaluating vector databases: VectorDBBench: An Open-Source VectorDB Benchmark Tool
2. Community Engagement
Join our Discord community to connect with GenAI developers from various industries and discuss everything related to vector embeddings, vector databases, and AI. Follow relevant discussions on Stack Overflow, Reddit, and GitHub to learn potential issues you might encounter when working with embeddings and improve your debugging skills.
Staying up-to-date with resources and engaging with the community ensures that your skills grow as technology advances, which offers you a competitive advantage in the AI industry.
- What are Vector Embeddings?
- Types of Vector Embeddings
- How are Vector Embeddings Created?
- What are Vector Embeddings Used for?
- Storing, Indexing, and Retrieving Vector Embeddings with Milvus
- Best Practices for Using Vector Embeddings
- Common Pitfalls and How to Avoid Them
- Further Resources
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading

Comparing SPLADE Sparse Vectors with BM25
In general, there are two types of vectors: dense vectors and sparse vectors. While they can be utilized for similar tasks, each has advantages and disadvantages. In this post, we will delve into two popular variants of sparse embedding: BM25 and SPLADE.

A Beginner's Guide to Website Chunking and Embedding for Your RAG Applications
In this post, we'll explain how to extract content from a website and use it as context for LLMs in a RAG application. However, before doing so, we need to understand website fundamentals.

Image Embeddings for Enhanced Image Search: An In-depth Explainer
Image Embeddings are the core of modern computer vision algorithms. Understand their implementation and use cases and explore different image embedding models.
