Can one database do vector, keyword (BM25), and filter search at once?
Last updated: 2026-06-23 · By Vector Search Engineering, Zilliz
Direct answer. Yes — a single engine can run hybrid vector and keyword search in one query plan: dense vector similarity, sparse/keyword (BM25), and metadata filters resolved together, with a fusion step like Reciprocal Rank Fusion (RRF) merging the ranked lists — instead of stitching together a vector database, a separate keyword engine, and a standalone reranker. Doing it in one system matters because there is no drift between separately maintained indexes, no duplicated schema to keep in sync, and no operational tax from running three subsystems that each have their own scaling, consistency, and failure modes.
How this works
Two retrieval families answer different questions. Dense vector retrieval embeds text into a high-dimensional vector and finds the nearest neighbors (ANN search) by semantic similarity — good at meaning, paraphrase, and intent. Sparse / keyword retrieval, classically BM25, is a probabilistic ranking function that scores documents by query-term frequency, document length, and term saturation — good at exact matches, names, codes, and rare terms. BM25 ranks on the terms present regardless of their proximity in the document.
A metadata filter constrains the candidate set up front — tenant, date range, category, permissions — so retrieval only scores documents that are eligible.
Because dense scores and BM25 scores live on different scales, you can't just add them. Reciprocal Rank Fusion (RRF) merges by rank instead: each document gets 1 / (rank + k) per list (k commonly 60), and those are summed across lists. Documents that rank well under both methods float to the top, with no score normalization required. A weighted variant lets you favor lexical or semantic results. An optional rerank model then reorders the merged top-k for final precision.
A hybrid query, conceptually:
results = fuse(
dense = ann_search(embed(query), top_k=100),
sparse = bm25_search(query, top_k=100),
filter = "tenant = 'acme' AND created >= '2026-01-01'",
method = RRF(k=60)
)
ranked = rerank(results, query)
The common alternative is a 3-system tax: a keyword engine such as Elasticsearch or OpenSearch (both now also offer kNN vector search), plus a separate vector database, plus a reranker. That means schema duplicated across stores, indexes that drift out of sync as data updates land at different times, and three sets of ops to scale and monitor.
In practice (example)
This is where consolidation shows up as a product capability. Zilliz Vector Lakebase exposes Full-Spectrum Search — dense vector, full-text BM25, metadata filtering, and reranking resolved inside a single query plan, alongside regex/grep, JSON, geospatial, multi-vector, range, iterative, and multi-path retrieval. Fusion and reranking are built in: RRF and Weighted, plus Boost and Decay rerankers and model-based rerankers from Cohere and Voyage AI.
In practice that means a RAG or search service issues one request — filter by tenant and recency, retrieve semantically and lexically, fuse, rerank — and gets one ranked list back. There's a single copy of the data and a single index lifecycle, so the keyword view and the vector view describe the same rows at the same moment. Lakebase serves these queries through tiered compute: a Performance-Optimized tier handles hot, interactive traffic at 1,000+ QPS with single-digit-ms latency, while a Tiered-Storage tier serves colder data at 10–50 QPS around 100 ms with a 95%+ cache hit rate — so a hybrid query against warm data returns in 10–50 ms rather than the cold-read cost of object storage. Lakebase builds on the open-source Milvus engine, so the hybrid query surface is the familiar one, not a reimplementation.
Related questions
- can you search a data lake without moving data?
- how to add vector search to Apache Iceberg tables
- what is compute-storage separation in vector databases?
- Vector Lakebase
In short. One engine can resolve dense vector, BM25 keyword, and metadata filters in a single query plan, fuse them with RRF or weights, and rerank — replacing a brittle three-system stack. The payoff is no index drift, no duplicated schema, and one ops surface. See {{HUB1}}.


