You can’t compare apples and oranges. Or can you? Vector databases like Milvus allow you to compare any data you can vectorize. You can even do it right in your Jupyter Notebook. But how does vector similarity search work?
Vector search has two critical conceptual components: indexes and distance metrics. Some popular vector indexes include HNSW, IVF, and ScaNN. There are three primary distance metrics: L2 or Euclidean distance, cosine similarity, and inner product. Other metrics for binary vectors include the Hamming Distance and the Jaccard Index.
In this article, we’ll cover:
Vector Similarity Metrics
L2 or Euclidean
How Does L2 Distance Work?
When Should You Use Euclidean Distance?
Cosine Similarity
How Does Cosine Similarity Work?
When Should You Use Cosine Similarity?
Inner Product
How Does Inner Product Work?
When Should You Use Inner Product?
Other Interesting Vector Similarity or Distance Metrics
Hamming Distance
Jaccard Index
Summary of Vector Similarity Search Metrics
Vector Similarity Metrics
Vectors can be represented as lists of numbers or as an orientation and a magnitude. For the easiest way to understand this, you can imagine vectors as line segments pointing in specific directions in space.
The L2 or Euclidean metric is the “hypotenuse” metric of two vectors. It measures the magnitude of the distance between where the lines of your vectors end.
The cosine similarity is the angle between your lines where they meet.
The inner product is the “projection” of one vector onto the other. Intuitively, it measures both the distance and angle between the vectors.
L2 or Euclidean
The most intuitive distance metric is L2 or Euclidean distance. We can imagine this as the amount of space between two objects. For example, how far your screen is from your face.
How Does L2 or Euclidean Distance Work?
So, we’ve imagined how L2 distance works in space; how does it work in math? Let’s begin by imagining both vectors as a list of numbers. Line the lists up on top of each other and subtract downwards. Then, square all of the results and add them up. Finally, take a square root.
Milvus skips the square root because the square-rooted and un-square-rooted rank order is the same. This way, we can skip an operation and get the same result, lowering latency and cost and increasing throughput. Below is an example of how Euclidean or L2 distance works.
d(Queen, King) = √(0.3-0.5)2 + (0.9-0.7)2
= √(0.2)2 + (0.2)2
= √0.04 + 0.04
= √0.08 ≅ 0.28
When Should You Use L2 or Euclidean Distance?
One of the main reasons to use Euclidean distance is when your vectors have different magnitudes. You primarily care about how far your words are in space or semantic distance.
Cosine Similarity
We use the term “cosine similarity” or “cosine distance” to denote the difference between the orientation of two vectors. For example, how far would you turn to face the front door?
Fun and applicable fact: despite the fact that “similarity” and “distance” have different meanings alone, adding cosine before both terms makes them mean almost the same thing! This is another example of semantic similarity at play.
How Does Cosine Similarity Work?
So, we know that cosine similarity measures the angle between two vectors. Once again, we imagine our vectors as a list of numbers. The process is a bit more complex this time, though.
We begin by lining the vectors on top of each other again. Start by multiplying the numbers down and then adding all of the results up. Now save that number; call it “x”. Next, we must square each number and add the numbers in each vector. Imagine squaring each number horizontally and adding them together for both vectors.
Take the square root of both sums, then multiply them, and call this result “y.” We find the value of our cosine distance as “x” divided by “y.”
When Should You Use Cosine Similarity?
Cosine similarity is primarily used in NLP applications. The main thing that cosine similarity measures is the difference in semantic orientation. If you work with normalized vectors, cosine similarity is equivalent to the inner product.
Inner Product
The inner product is the projection of one vector onto the other. The inner product's value is the vector's length drawn out. The bigger the angle between the two vectors, the smaller the inner product. It also scales with the length of the smaller vector. So, we use the inner product when we care about orientation and distance. For example, you would have to run a straight distance through the walls to your refrigerator.
How Does Inner Product Work?
The inner product should look familiar. It’s just the first ⅓ of the cosine calculation. Line those vectors up in your mind and go down the row, multiplying downward. Then, sum them up. This measures the straight line distance between you and the nearest dim sum.
When Should You Use Inner Product?
The inner product is like a cross between Euclidean distance and cosine similarity. When it comes to normalized datasets, it is the same as cosine similarity, so IP is suited for either normalized or non-normalized datasets. It is a faster option than cosine similarity, and it is a more flexible option.
One thing to keep in mind with Inner Product is that it doesn’t follow the triangle inequality. Larger lengths (large magnitudes) are prioritized. This means we should be careful when using IP with Inverted File Index or a graph index like HNSW.
Other Interesting Vector Distance or Similarity Metrics
The three vector metrics mentioned above are the most useful regarding vector embeddings. However, they’re not the only ways to measure the distance between two vectors. Here are two other ways to measure distance or similarity between vectors.
Hamming Distance
Hamming distance can be applied to vectors or strings. For our use cases, let’s stick to vectors. Hamming distance measures the “difference” in the entries of two vectors. For example, “1011” and “0111” have a Hamming distance of 2.
In terms of vector embeddings, Hamming distance only really makes sense to measure for binary vectors. Float vector embeddings, the outputs of the second to last layer of neural networks, are made up of floating point numbers between 0 and 1. Examples could include [0.24, 0.111, 0.21, 0.51235] and [0.33, 0.664, 0.125152, 0.1].
As you can see, the Hamming distance between two vector embeddings will almost always come out to just the length of the vector itself. There are just too many possibilities for each value. That’s why Hamming distance can only be applied to binary or sparse vectors. The type of vectors that are produced from a process like TF-IDF, BM25, or SPLADE.
Hamming distance is good to measure something like the difference in wording between two texts, the difference in the spelling of words, or the difference between any two binary vectors. But it’s not good for measuring the difference between vector embeddings.
Here’s a fun fact. Hamming distance is equivalent to summing the result of an XOR operation on two vectors.
Jaccard Distance
Jaccard distance is another way to measure two vectors' similarity or distance. The interesting thing about Jaccard is that there is both a Jaccard Index and a Jaccard Distance. Jaccard distance is 1 minus the Jaccard index, the distance metric Milvus implements.
Calculating Jaccard distance or index is an interesting task because it doesn’t exactly make sense at first glance. Like Hamming distance, Jaccard only works on binary data. I find the traditional formation of “unions” and “intersections” confusing. The way I think about it is with logic. It’s essentially A “OR” B minus A “AND” B divided by A “OR” B.
As shown in the image above, we count the number of entries where either A or B is 1 as the “union” and where both A and B are 1 as the “intersection.” So the Jaccard index for A (01100111) and B (01010110) is ½. In this case, the Jaccard distance, 1 minus the Jaccard index, is also ½.
Summary of Vector Similarity Search Metrics
In this post, we learned about the three most useful vector similarity search metrics: L2 (also known as Euclidean) distance, cosine distance, and inner product. Each of these has different use cases. Euclidean is for when we care about the difference in magnitude. Cosine is for when we care about the difference in orientation. The inner product is when we care about the difference in magnitude and orientation.
Check these videos to learn more about Vector Similarity Metrics, or read the docs to learn how to configure these metrics in Milvus.