Contrastive learning generates embeddings by training a model to bring similar data points closer together in the embedding space while pushing dissimilar ones apart. This is achieved using pairs or triplets of data, where "positive" pairs are similar (e.g., two views of the same image), and "negative" pairs are dissimilar (e.g., different images).
A common objective function for contrastive learning is the InfoNCE loss, which maximizes similarity for positive pairs while minimizing similarity for negative pairs. Models like SimCLR and CLIP leverage contrastive learning to produce high-quality embeddings for images, text, and other modalities.
Contrastive learning is particularly effective in self-supervised settings, where labeled data is scarce. By using augmentations or natural relationships in the data, it generates embeddings that generalize well to downstream tasks like classification, clustering, and retrieval.