Similarity Metrics for Vector Search

Vector Similarity Metrics for Search - Zilliz Blog
You can’t compare apples and oranges. Or can you? Vector databases like Milvus allow you to compare any data you can vectorize. You can even do it right in your Jupyter Notebook. But how does vector similarity search work?
Vector search has two critical conceptual components: indexes and distance metrics. Some popular vector indexes include HNSW, IVF, and ScaNN. There are three primary distance metrics: L2 or Euclidean distance, cosine similarity, and inner product. Manhattan distance calculates the distance between points by summing the absolute differences across each dimension and is advantageous in scenarios where outlier sensitivity needs to be minimized. Other metrics for binary vectors include the Hamming Distance and the Jaccard Index.
In this article, we’ll cover:
Vector Similarity Metrics
L2 or Euclidean
How Does L2 Distance Work?
When Should You Use Euclidean Distance?
Cosine Similarity
How Does Cosine Similarity Work?
When Should You Use Cosine Similarity?
Inner Product
How Does Inner Product Work?
When Should You Use Inner Product?
Other Interesting Vector Similarity or Distance Metrics
Hamming Distance
Jaccard Index
Summary of Vector Similarity Search Metrics
Vectors can be represented as lists of numbers or as an orientation and a magnitude. For the easiest way to understand this, you can imagine vectors as line segments pointing in specific directions in space.
The L2 or Euclidean metric is the “hypotenuse” metric of two vectors. It measures the magnitude of the distance between where the lines of your vectors end.
The cosine similarity is the angle between your lines where they meet.
The inner product is the “projection” of one vector onto the other. Intuitively, it measures both the distance and angle between the vectors.
The most intuitive distance metric is L2 or Euclidean distance. We can imagine this as the amount of space between two objects. For example, how far your screen is from your face.
So, we’ve imagined how L2 distance works in space; how does it work in math? Let’s begin by imagining both vectors as a list of numbers. Line the lists up on top of each other and subtract downwards. Then, square all of the results and add them up. Finally, take a square root.
Milvus skips the square root because the square-rooted and un-square-rooted rank order is the same. This way, we can skip an operation and get the same result, lowering latency and cost and increasing throughput. Below is an example of how Euclidean or L2 distance works.
d(Queen, King) = √(0.3-0.5)2 + (0.9-0.7)2
= √(0.2)2 + (0.2)2
= √0.04 + 0.04
= √0.08 ≅ 0.28
One of the main reasons to use Euclidean distance is when your vectors have different magnitudes. You primarily care about how far your words are in space or semantic distance.
We use the term “cosine similarity” or “cosine distance” to denote the difference between the orientation of two vectors. For example, how far would you turn to face the front door?
Fun and applicable fact: despite the fact that “similarity” and “distance” have different meanings alone, adding cosine before both terms makes them mean almost the same thing! This is another example of semantic similarity at play.
So, we know that cosine similarity measures the angle between two vectors. Once again, we imagine our vectors as a list of numbers. The process is a bit more complex this time, though.
We begin by lining the vectors on top of each other again. Start by multiplying the numbers down and then adding all of the results up. Now save that number; call it “x”. Next, we must square each number and add the numbers in each vector. Imagine squaring each number horizontally and adding them together for both vectors.
Take the square root of both sums, then multiply them, and call this result “y.” We find the value of our cosine distance as “x” divided by “y.”
Cosine similarity is primarily used in NLP applications. The main thing that cosine similarity measures is the difference in semantic orientation. If you work with normalized vectors, cosine similarity is equivalent to the inner product.
The inner product is the projection of one vector onto the other. The inner product’s value is the vector’s length drawn out. The bigger the angle between the two vectors, the smaller the inner product. It also scales with the length of the smaller vector. So, we use the inner product when we care about orientation and distance. For example, you would have to run a straight distance through the walls to your refrigerator.
The inner product should look familiar. It’s just the first ⅓ of the cosine calculation. Line those vectors up in your mind and go down the row, multiplying downward. Then, sum them up. This measures the straight line distance between you and the nearest dim sum.
The inner product is like a cross between Euclidean distance and cosine similarity. When it comes to normalized datasets, it is the same as cosine similarity, so IP is suited for either normalized or non-normalized datasets. It is a faster option than cosine similarity, and it is a more flexible option.
One thing to keep in mind with Inner Product is that it doesn’t follow the triangle inequality. Larger lengths (large magnitudes) are prioritized. This means we should be careful when using IP with Inverted File Index or a graph index like HNSW.
The three vector metrics mentioned above are the most useful regarding vector embeddings. However, they’re not the only ways to measure the distance between two vectors. Here are two other ways to measure distance or similarity between vectors.
Group 13401.png
Hamming distance can be applied to vectors or strings. For our use cases, let’s stick to vectors. Hamming distance measures the “difference” in the entries of two vectors. For example, “1011” and “0111” have a Hamming distance of 2.
In terms of vector embeddings, Hamming distance only really makes sense to measure for binary vectors. Float vector embeddings, the outputs of the second to last layer of neural networks, are made up of floating point numbers between 0 and 1. Examples could include [0.24, 0.111, 0.21, 0.51235] and [0.33, 0.664, 0.125152, 0.1].
As you can see, the Hamming distance between two vector embeddings will almost always come out to just the length of the vector itself. There are just too many possibilities for each value. That’s why Hamming distance can only be applied to binary or sparse vectors. The type of vectors that are produced from a process like TF-IDF, BM25, or SPLADE.
Hamming distance is good to measure something like the difference in wording between two texts, the difference in the spelling of words, or the difference between any two binary vectors. But it’s not good for measuring the difference between vector embeddings.
Here’s a fun fact. Hamming distance is equivalent to summing the result of an XOR operation on two vectors.
Jaccard distance is another way to measure two vectors’ similarity or distance. The interesting thing about Jaccard is that there is both a Jaccard Index and a Jaccard Distance. Jaccard distance is 1 minus the Jaccard index, the distance metric Milvus implements.
Calculating Jaccard distance or index is an interesting task because it doesn’t exactly make sense at first glance. Like Hamming distance, Jaccard only works on binary data. I find the traditional formation of “unions” and “intersections” confusing. The way I think about it is with logic. It’s essentially A “OR” B minus A “AND” B divided by A “OR” B.
As shown in the image above, we count the number of entries where either A or B is 1 as the “union” and where both A and B are 1 as the “intersection.” So the Jaccard index for A (01100111) and B (01010110) is ½. In this case, the Jaccard distance, 1 minus the Jaccard index, is also ½.
In this post, we learned about the three most useful vector similarity search metrics: L2 (also known as Euclidean) distance, cosine distance, and inner product. Each of these has different use cases. Euclidean is for when we care about the difference in magnitude. Cosine is for when we care about the difference in orientation. The inner product is when we care about the difference in magnitude and orientation.
Check these videos to learn more about Vector Similarity Metrics, or read the docs to learn how to configure these metrics in Milvus.
Introduction to Similarity Metrics
Similarity metrics are a crucial tool in various data analysis and machine learning tasks. They enable us to compare and evaluate the similarity between different pieces of data, facilitating applications such as clustering, classification, and recommendations. With numerous similarity metrics available, each with its strengths and weaknesses, choosing the right one for a specific task can be challenging. In this section, we will introduce the concept of similarity metrics, their importance, and provide an overview of the most commonly used metrics.
Cosine Similarity
Cosine similarity is a widely used similarity metric that measures the cosine of the angle between two vectors. It is commonly used in natural language processing and information retrieval tasks. The cosine similarity metric is particularly useful when dealing with high-dimensional data, as it is computationally efficient and can handle sparse data. The cosine similarity between two vectors can be calculated using the dot product of the vectors divided by the product of their magnitudes.
Euclidean Distance
Euclidean distance, also known as straight-line distance, is a widely used distance metric that measures the distance between two points in n-dimensional space. It is calculated as the square root of the sum of the squared differences between the corresponding elements of the two vectors. Euclidean distance is commonly used in various applications, including clustering, classification, and regression analysis. However, it can be sensitive to outliers and may not perform well with high-dimensional data.
Choosing the Right Similarity Metric
Choosing the right similarity metric depends on various factors, including the type of data, analysis goals, and the relationship between variables. For example, cosine similarity is suitable for high-dimensional data and natural language processing tasks, while Euclidean distance is commonly used for clustering and classification tasks. Manhattan distance, also known as L1 distance, is suitable for data with outliers, while Hamming distance is used for binary data. It is essential to understand the characteristics and limitations of each similarity metric to choose the most appropriate one for a specific task.
Real-World Applications
Similarity metrics have numerous real-world applications in various fields, including:
Natural language processing: Cosine similarity is widely used in text classification, sentiment analysis, and information retrieval tasks.
Recommendation systems: Similarity metrics, such as cosine similarity and Euclidean distance, are used to recommend products or services based on user behavior and preferences.
Image and video analysis: Similarity metrics, such as Euclidean distance and Manhattan distance, are used in image and video classification, object detection, and tracking tasks.
Clustering and classification: Similarity metrics, such as Euclidean distance and cosine similarity, are used in clustering and classification tasks to group similar data points together.
In conclusion, similarity metrics are a crucial tool in various data analysis and machine learning tasks. Understanding the characteristics and limitations of each similarity metric is essential to choose the most appropriate one for a specific task. By selecting the right similarity metric, we can improve the accuracy and relevance of our results, leading to better decision-making and insights.
- Introduction to Similarity Metrics
- Cosine Similarity
- Euclidean Distance
- Choosing the Right Similarity Metric
- Real-World Applications
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading

Building RAG Applications with Milvus, Qwen, and vLLM
In this blog, we will explore Qwen and vLLM and how combining both with the Milvus vector database can be used to build a robust RAG system.

Mixture-of-Agents (MoA): How Collective Intelligence Elevates LLM Performance
Mixture-of-Agents (MoA) is a framework where multiple specialized LLMs, or "agents," collaborate to solve tasks by leveraging their unique strengths.

Milvus on GPUs with NVIDIA RAPIDS cuVS
GPU-accelerated vector search through NVIDIA's cuVS library and CAGRA algorithm are highly beneficial for optimizing AI app performance in production.