What is Lexical search and how does it work?

Lexical search is a text retrieval method that matches documents to queries based on exact words and their frequencies rather than semantic meaning. It operates at the word or token level, relying on direct text comparisons between the query and indexed documents. The core concept is simple: documents containing the query terms are considered relevant, and their relevance is quantified by ranking models such as TF-IDF or BM25. Lexical search does not interpret context or synonyms; it retrieves results purely based on how the query words appear in the text.

The process begins with tokenization and indexing. During indexing, each document is broken into tokens (words, stems, or n-grams) that are stored in an inverted index, mapping terms to the documents where they occur. At query time, the system performs the same tokenization on the user’s input and retrieves all documents containing those terms. A ranking function then scores each document based on how often the terms appear and how distinctive they are across the corpus. For instance, if “vector” appears frequently in one document but rarely elsewhere, that document ranks higher for the query “vector database.”

Lexical search is often used alongside vector databases such as Milvus to build hybrid systems. While Lexical search provides speed and precision for exact text matching, Milvus adds semantic retrieval using vector embeddings. For example, a Lexical search for “database index” ensures exact phrase matching, while Milvus can find related concepts like “storage optimization.” This combination allows developers to achieve both literal and contextual relevance, making Lexical search a foundational component in scalable, hybrid information retrieval pipelines.