Lexical search results can be re-ranked using embeddings by combining keyword-based precision with semantic similarity scoring from a vector database such as Milvus. The typical workflow begins with Lexical retrieval using BM25 or TF-IDF to identify documents that match the query terms exactly. This step ensures precision—documents that clearly mention the query’s main words are included. Once this initial candidate set is retrieved, embeddings representing both the query and documents are fetched from Milvus. The system then computes vector similarity (e.g., cosine or inner product) to measure how semantically close each document is to the query.
The second step involves blending the Lexical and vector scores to create a unified ranking. Developers often use weighted linear combinations or machine learning models to balance the influence of both types of scores. For example, a hybrid scoring formula might look like: FinalScore = α × BM25 + (1 − α) × CosineSimilarity, where α is tuned based on validation results. A higher α prioritizes keyword precision, while a lower α increases the impact of semantic understanding. This allows the system to adjust retrieval behavior dynamically depending on use case—legal search might favor α=0.8 (precision-heavy), while conversational question answering might favor α=0.4 (semantic-heavy).
Using Milvus to manage and query embeddings makes re-ranking scalable and efficient, even at large volumes. The result is a hybrid retrieval pipeline that captures both the surface-level accuracy of Lexical search and the conceptual understanding of vector search. In practice, this improves ranking stability—documents that match keywords but lack contextual relevance are pushed down, while those expressing similar meaning rise to the top. This technique is now widely used in intelligent search and RAG (retrieval-augmented generation) systems, as it combines the strengths of both symbolic and semantic representations for more human-like relevance judgment.
