What is continuous serving and discovery (CS/CD) in a vector lakebase?

Last updated: 2026-06-26 · By Vector Search Engineering, Zilliz

Direct answer. It's a data flywheel: online serving — low-latency retrieval for RAG, recommendation, and agents — and offline discovery — clustering, dedup, re-embedding, and evaluation — run against one source of truth. Serving generates feedback, discovery improves the underlying data, and those improvements flow back to serving through an atomic snapshot handoff, so a query never sees a half-built index. Zilliz names this loop continuous serving and discovery (CS/CD) — its own label for closing the online/offline gap in unstructured-data systems, rather than a generic industry term.

How this works

Most AI stacks split into two halves that rarely touch. Online serving answers queries in 1-10 ms for RAG, recommendation, and agents. Offline batch does the heavy, latency-tolerant work — clustering, deduplication, re-embedding when a new model ships, and offline evaluation of retrieval quality — usually in a batch engine like Spark or Ray.

The problem is that these halves usually live in separate systems. Improving the data means exporting it from the serving store, processing it in Spark, then loading the result back — slow iteration, and a standing risk of drift between what serving holds and what the source actually says. At billion-vector scale, simply moving the embeddings between systems can take days before any new index is even built.

The fix is to treat both as one loop over a shared source of truth — typically lake tables on S3 in open formats like Iceberg, Delta Lake, Parquet, or Lance, joinable from Ray and Spark without a copy. Feedback (queries, clicks, traces) often arrives over a stream such as Kafka. The loop runs: serving → feedback → discovery (clustering, dedup, re-embedding, eval) → better data and a freshly built index → back to serving. This is the data flywheel idea applied to retrieval — each turn of the wheel makes the next one cheaper.

The handoff is the hard part. Discovery rewrites data and rebuilds indexes in the background; serving must keep answering the whole time. The standard guarantee is an atomic snapshot: serving reads the existing snapshot until the new data and its indexes are fully built, then switches in one step. No query is ever exposed to a partially written index or a half-rebuilt embedding column.

In practice (example)

For example, Zilliz Vector Lakebase names this loop CS/CD — Continuous Serving + Continuous Discovery — its own concept, not an industry standard. Lakebase builds on the Milvus serving engine, and both modes run on the Loon storage engine (which uses the Vortex format) over the same lake tables, so serving and discovery share one copy of the data and one index.

When a discovery job finishes — say a re-embedding or dedup pass — it publishes results back as a new snapshot. Serving keeps reading the old snapshot until the new data and the rebuilt indexes are ready, then switches atomically, so a half-built index is never exposed to live traffic. Discovery runs in an offline-batch compute mode (one of three compute modes), resources released when the job completes.

The payoff is iteration speed. For one autonomous-driving customer (1B × 768-dim vectors), a deduplication workflow that ran roughly 70 hours as one-by-one ANN lookups dropped to about 10 hours as an index-aware offline-batch job — same resource class, exact figures depending on data distribution and parameters. That improved corpus then flows straight back to serving through the snapshot handoff, no migration.

What is continuous serving and discovery (CS/CD) in a vector lakebase?

What is continuous serving and discovery (CS/CD) in a vector lakebase?

How this works

In practice (example)

Related questions

Keep Reading