What is the typical code snippet to compute the cosine similarity between two sentence embeddings using the library?

To compute cosine similarity between two sentence embeddings using Python, you typically use libraries like NumPy or scikit-learn. Here's a straightforward example:

Using NumPy Cosine similarity measures the cosine of the angle between two vectors. The formula is the dot product of the vectors divided by the product of their magnitudes. For two embeddings emb1 and emb2 (1D arrays), the code is:

import numpy as np

def cosine_similarity(emb1, emb2):
 dot_product = np.dot(emb1, emb2)
 norm_emb1 = np.linalg.norm(emb1)
 norm_emb2 = np.linalg.norm(emb2)
 return dot_product / (norm_emb1 * norm_emb2)

This works for single vectors. If your embeddings are 2D (e.g., from a batch process), flatten them first: emb1.flatten().

Using scikit-learn For batch operations or pairwise comparisons, sklearn.metrics.pairwise.cosine_similarity is efficient. For two embeddings stored as 2D arrays (shape (1, embedding_size)):

from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(emb1.reshape(1, -1), emb2.reshape(1, -1))[0][0]

This handles reshaping to ensure compatibility and returns a scalar value.

Key Considerations

Normalization: If embeddings are pre-normalized (common in libraries like Sentence Transformers), cosine similarity simplifies to np.dot(emb1, emb2.T).
Batch Processing: For multiple embeddings, pass a matrix to cosine_similarity to get pairwise results.
Performance: NumPy is sufficient for small-scale tasks; scikit-learn optimizes batch operations.

Example with actual values:

emb1 = np.array([0.2, 0.5, 0.8])
emb2 = np.array([0.3, 0.4, 0.7])
print(cosine_similarity(emb1, emb2)) # Output: ~0.992

This approach works with any embedding format (PyTorch, TensorFlow) by converting them to NumPy arrays first.

Your AI Reference Guide
What is the typical code snippet to compute the cosine similarity between two sentence embeddings using the library?

What is the typical code snippet to compute the cosine similarity between two sentence embeddings using the library?

Recommended AI Learn Series

VectorDB for GenAI Apps

Share this article

Keep Reading

AI Assistant

Your AI Reference GuideWhat is the typical code snippet to compute the cosine similarity between two sentence embeddings using the library?

What is the typical code snippet to compute the cosine similarity between two sentence embeddings using the library?

Recommended AI Learn Series

VectorDB for GenAI Apps

Share this article

Keep Reading

AI Assistant

Your AI Reference Guide
What is the typical code snippet to compute the cosine similarity between two sentence embeddings using the library?