How can developers tune BM25 for better Lexical search results?

BM25 remains one of the most reliable scoring algorithms for Lexical search, but developers can tune its parameters to achieve optimal performance for specific datasets and query types. BM25 ranks documents based on term frequency, inverse document frequency, and document length normalization, using two main parameters: k1 and b. The k1 parameter controls how quickly term frequency saturates, while b controls how strongly document length affects scores. Fine-tuning these allows developers to adapt BM25 to the corpus—whether it contains short product descriptions, long-form articles, or technical manuals.

For example, when searching over short texts such as FAQs or metadata fields, a lower b (e.g., 0.2–0.4) reduces length normalization, ensuring short documents aren’t penalized unfairly. In contrast, for longer texts like blog posts or research papers, a higher b (e.g., 0.6–0.8) helps balance the impact of document length. Similarly, k1 can be adjusted between 1.0 and 2.0 depending on how much weight to give repeated terms. Developers can empirically evaluate different combinations using metrics like Mean Average Precision (MAP) or Normalized Discounted Cumulative Gain (NDCG) on a validation set.

Tuning BM25 can also improve the effectiveness of hybrid search with Milvus. Once BM25 produces strong lexical results, those scores can be combined with embedding similarity from a vector database to generate more consistent rankings. For instance, if BM25 yields a top 50 list based on precise keyword matches, developers can then use vector-based re-ranking to refine the semantic order. This interplay between well-tuned BM25 and Milvus similarity scoring ensures the final retrieval results are both precise and contextually meaningful, bridging the gap between keyword accuracy and conceptual relevance.