In vector search, similarity is measured using mathematical metrics to quantify how close or related two vectors are. The three primary metrics are Euclidean distance (L2), cosine similarity, and inner product. Each serves a specific purpose depending on the application and the type of data being analyzed. The choice of metric influences the performance and results of the search process.
Euclidean distance measures the straight-line distance between two vectors in space. It is intuitive and suitable for comparing data where both direction and magnitude matter, like image pixel intensities. Cosine similarity, on the other hand, calculates the angle between vectors, making it ideal for text or high-dimensional data where orientation, rather than magnitude, carries semantic information. Inner product, or dot product, combines aspects of both distance and orientation and is useful when the magnitude and projection are relevant.
For example, in an e-commerce application, cosine similarity might be used to compare user preferences (as embeddings) with product embeddings to recommend items. For image processing, Euclidean distance can measure pixel-based differences, while the inner product is often applied in scenarios involving normalized or sparse vectors. The appropriate metric depends on the data type and the specific task at hand.