Cosine similarity measures the similarity between two vectors by calculating the cosine of the angle between them. Mathematically, it is the dot product of the vectors divided by the product of their magnitudes. This metric ranges from -1 to 1, where 1 indicates identical direction (maximum similarity), 0 implies orthogonality (no similarity), and -1 signifies opposite direction. Unlike Euclidean distance, cosine similarity focuses on orientation rather than magnitude, making it ideal for comparing high-dimensional vectors where differences in scale are less meaningful. For example, in natural language processing (NLP), text embeddings often reside in high-dimensional spaces, and cosine similarity effectively captures semantic relationships regardless of vector length.
Sentence Transformers generate dense vector representations (embeddings) of sentences, optimized to preserve semantic meaning. These models, such as BERT or RoBERTa variants, are fine-tuned to produce embeddings where semantically similar sentences are close in vector space. When using cosine similarity with these embeddings, the process involves encoding two sentences into vectors using the Sentence Transformer model and then computing the cosine similarity between them. Because the embeddings are typically normalized (scaled to unit length), the cosine similarity simplifies to the dot product of the vectors, which is computationally efficient. For instance, the sentences "A cat sits on a mat" and "A kitten rests on a rug" would yield embeddings with high cosine similarity, reflecting their semantic closeness.
This approach is widely used in semantic search, recommendation systems, and clustering. For example, a search engine might encode a user’s query and compare it to document embeddings to retrieve the most relevant results. Cosine similarity is preferred here because Sentence Transformers are trained to align the vector space such that semantic similarity corresponds to angular proximity. Unlike metrics like Euclidean distance, cosine similarity ignores vector magnitude, which is irrelevant when comparing embeddings focused on semantic content. Tools like the sentence-transformers
library simplify this workflow by providing pre-trained models and built-in functions for computing cosine similarity, enabling developers to implement semantic similarity checks with minimal effort.