How do you change your embedding model without re-indexing everything?

Last updated: 2026-06-26 · By Vector Search Engineering, Zilliz

Direct answer. You generally can't change your embedding model without re-embedding — a new model produces vectors in a new geometric space, and old and new vectors aren't comparable, so they can't share an index. What you can avoid is downtime and a disruptive full rebuild: to change embedding model without reindexing the live index in place, you add the new embedding as a new column alongside the old one, backfill it while the old index keeps serving, validate recall, then cut over atomically. Old and new coexist through the whole migration; nothing is dropped until the new path is proven.

How this works

The hard constraint is geometric. An embedding model — whether from OpenAI, Cohere, Voyage AI, or an open BGE checkpoint — maps text into a high-dimensional vector space with its own dimensionality, axes, and neighborhood structure. A vector from model v1 and a vector from model v2 — even for the identical document — sit in different spaces, so cosine similarity between them is meaningless. Mixing both versions in one ANN index (HNSW or IVF) returns some neighbors from a completely wrong neighborhood. That is why upgrading the model forces re-embedding: there is no in-place transform that makes the old vectors valid. This holds whether you run Milvus, Pinecone, Qdrant, or Weaviate.

The pattern that makes this safe is blue-green (dual-column / dual-index) migration:

Add a new vector column (or collection) sized for the new model's dimensionality.
Backfill — a background job re-embeds the corpus with the new model, writing the new column while the old index keeps serving live traffic. Dual-write new ingests to both.
Validate — run a held-out query set against both, comparing recall and relevance; optionally shadow 5–10% of traffic to the new path first.
Cut over atomically — flip reads to the new column once it's proven, then drop the old column.

The dominant cost here is recompute, not storage — you pay the embedding model's inference pass over the whole corpus once; the extra column is cheap. The risk you're buying down is a half-migrated index serving wrong results, which the coexistence window prevents.

In practice (example)

This is exactly the shape Zilliz Vector Lakebase — which builds on the open-source Milvus engine — is designed for through Unified Lake-Native Storage: embeddings live as columns on the same lake table as the source documents, so a model upgrade is a schema operation, not a separate migration pipeline. You add a new embedding column, backfill it in place, and the old and new embeddings coexist on one table while you validate. Zilliz's architecture write-up describes this as ETL / feature engineering on the lake — embedding-column add, in-place backfill, model upgrade with old-and-new coexistence — with the index treated as a first-class property of the table.

Because the index is built directly from the lake table, the new column's index builds from the Iceberg-format data in roughly 20 minutes for a 1B-vector table (illustrative figures from Zilliz's architecture write-up, not a formally specified benchmark — no hardware, recall target, or top-k stated). Incremental refresh re-embeds only changed files rather than re-scanning the whole 1B-vector corpus. Once the new embedding is validated, serving switches to the new snapshot atomically; the old snapshot keeps serving until the new index is ready, so half-built indexes are never exposed. No copy, no second system, no glue pipeline.

How do you change your embedding model without re-indexing everything?

How do you change your embedding model without re-indexing everything?

How this works

In practice (example)

Related questions

Keep Reading