How does query expansion improve Lexical search recall?

Query expansion improves Lexical search recall by broadening the set of terms used to match documents, allowing the search system to capture relevant results that do not use the exact same wording as the query. In traditional Lexical search, only documents containing the exact query tokens are retrieved, which means that semantically similar terms are often missed. Query expansion mitigates this limitation by adding related words, synonyms, or derived terms to the original query. For example, expanding “database” to include “data store,” “repository,” and “data management system” ensures that more potentially relevant documents are retrieved.

There are multiple ways to perform query expansion. The simplest is rule-based, where developers maintain a manually curated synonym list or use linguistic resources like WordNet. More advanced approaches use statistical co-occurrence or embeddings to find semantically related terms dynamically. For instance, by encoding query words as vectors in Milvus, a system can identify nearby terms in embedding space and use them for query expansion. This embedding-based method allows for context-aware expansion, ensuring that words added are relevant to the specific meaning of the query rather than generic synonyms.

By improving recall, query expansion enhances hybrid retrieval systems that integrate Milvus. After Lexical search retrieves a broader candidate set using expanded terms, Milvus can re-rank those results semantically to ensure precision is maintained. This is especially useful in technical or domain-specific contexts, where different terminologies describe the same concept—such as “vector index” versus “ANN search.” In this setup, Lexical search captures the surface-level diversity of language, while Milvus ensures that the top-ranked results truly align with the user’s intent. The result is a more robust and flexible search pipeline that balances recall and relevance.