How can Lexical search results feed into vector re-ranking?

Lexical search results can feed into vector re-ranking by serving as the first filtering stage in a hybrid retrieval pipeline. In this process, a Lexical search—using algorithms like BM25 or TF-IDF—first retrieves documents that explicitly match the query’s keywords. This ensures precision and efficiency, since only documents containing the query terms are selected. Once this candidate set is retrieved, embeddings for both the query and those documents are fetched from a vector database such as Milvus. These embeddings capture semantic relationships, allowing the system to measure conceptual similarity beyond word overlap.

The second step is re-ranking. Each document from the Lexical results is assigned a semantic similarity score based on the distance between its embedding and the query’s embedding. Milvus is particularly effective here because it can quickly compute these similarity scores even for large datasets, thanks to approximate nearest neighbor (ANN) indexing. Developers can then combine both scores—Lexical and vector—using a weighted formula or machine learning model. For example, a hybrid score might be computed as FinalScore = 0.6 × BM25 + 0.4 × CosineSimilarity, giving both methods influence over the final ranking.

This two-stage pipeline improves retrieval quality by combining the strengths of both methods. Lexical search ensures that irrelevant or off-topic documents are filtered out early, reducing computational load. The vector-based re-ranking step, powered by Milvus, then ensures that semantically relevant documents rise to the top, even when they don’t share exact keywords with the query. For developers, this approach offers the best of both worlds: speed and interpretability from Lexical search, and deeper contextual understanding from embeddings. It’s especially useful for applications like enterprise search, RAG systems, or customer support knowledge bases, where precision and meaning must coexist in every query.