Cosine similarity is a metric used to measure the similarity between two vectors by calculating the cosine of the angle between them. It ranges from -1 (completely dissimilar) to 1 (completely similar), with 0 indicating orthogonality (no similarity). Cosine similarity is widely used with embeddings to compare the similarity of two vectors, such as word, document, or image embeddings, by evaluating how close they are in the vector space.
In practice, cosine similarity is used to determine how similar two data points are based on their embeddings. For example, in a text-based recommendation system, cosine similarity can be used to find the most similar products or articles to a given query by comparing the query embedding with the embeddings of all items in the database.
Cosine similarity is preferred for embedding-based similarity searches because it is scale-invariant; it measures the angle between vectors rather than their magnitude, making it ideal for comparing vectors of different lengths. This property allows cosine similarity to work effectively even if the embeddings are normalized or have different magnitudes, as is often the case in many machine learning applications.