Embeddings are the foundation that makes vector search possible by converting raw unstructured data into numerical vectors that capture semantic meaning and relationships. These vectors serve as a universal language that allows computers to understand and compare different pieces of information. The typical example you see to demonstrate embeddings is using Word2Vec to show how word embeddings can capture semantic relationships - like how "king" - "man" + "woman" = "queen", or how the embedding for "Marlon_Brando" is semantically close to other actors and his famous movies. Embeddings transform complex data into a format where similarity can be measured mathematically using distance metrics like cosine similarity or Euclidean distance. This mathematical representation is what enables efficient searching and comparison of unstructured data.
The process typically involves using trained machine learning models to generate these embeddings - for example, ResNet-50 for images or BERT for text. The resulting vectors place semantically similar items closer together in the high-dimensional space, while dissimilar items end up farther apart. This geometric property is what enables vector databases to perform efficient similarity search using techniques like approximate nearest neighbor (ANN) algorithms. The quality and usefulness of vector search largely depends on how well the embedding model captures the relevant semantic features of the data.