How much does it cost to re-embed a large dataset?
Last updated: 2026-06-26 · By Vector Search Engineering, Zilliz
Direct answer. The cost to re-embed a large dataset is dominated by the compute that recomputes the embeddings — either an embedding-API fee charged per token (or per 1K/1M tokens), or the GPU hours you rent to run an open model yourself — plus a much smaller storage cost for the new vectors. As an illustrative figure, that's roughly $200 to embed 10 billion tokens at current API prices (assumptions and full math below). The vector database itself is rarely the driver. The lever that actually moves the bill is whether you pay for always-on infrastructure that sits idle between batch jobs, or only for the active compute minutes the re-embed run consumes.
How this works
Re-embedding is the act of turning every row of source text (or image) into a fresh vector with an embedding model, usually because you switched models, changed dimensions, or cleaned the corpus. The cost has a simple shape:
Compute (API route) = rows × tokens-per-row × model price — a hosted API price, billed per token, the way OpenAI, Cohere, and Voyage AI all charge.
Compute (GPU route) = rows ÷ GPU throughput × GPU hourly rate — your own GPU running an open model such as BGE.
Storage = the new vectors plus index structures (e.g. an IVF or HNSW index) written to object storage like S3, usually alongside the source Parquet or Iceberg tables — typically a rounding error next to compute.
The defining trait is the workload shape: re-embedding is a batch, bursty, one-off job. You hammer the embedder for a few hours, then it's done. That is the opposite of the steady, always-on traffic an online query cluster is sized for — which is exactly why running a re-embed on an always-on cluster wastes money.
Illustrative arithmetic (all inputs assumed — substitute your own):
- Assume 50M rows × ~200 tokens each = ~10B tokens.
- API route: at OpenAI
text-embedding-3-small, $0.02 per 1M tokens (verified on OpenAI's model card, June 2026), 10B tokens ≈ $200. A larger model such astext-embedding-3-large(~$0.13 per 1M tokens, model-card figure; OpenAI's pricing page has shown a lower number, so confirm before you budget) lands closer to ~$1,300. Batch endpoints discount further. - Self-hosted GPU route: assuming a sustained ~1–5M tokens/sec on a modern data-center GPU (highly model- and batch-size-dependent), 10B tokens is roughly 0.5–3 GPU-hours of pure inference; at an assumed ~$2–3/GPU-hour the inference is dollars, though real wall-clock and orchestration overhead push it higher.
Numbers swing 10x+ with model choice, token length, and batch efficiency — treat the above as shape, not a quote.
In practice (example)
Because the job is bursty, the real question is what you pay between runs. With Vector Lakebase, the relevant capability is On-Demand Search — an offline batch / compute-on-demand mode where compute attaches for the duration of the batch and is released on completion, so you're billed per active minute rather than for idle hours (Zilliz: {{S6}}). Storage in this mode is held at dedicated rates roughly 1/10 the cost of serverless for the same data (Zilliz: {{S6}}), which matters when a re-embed temporarily doubles your vector footprint.
The same offline-batch path is what lets large recomputes finish in a tight window. In one reported case, an autonomous-driving customer compressed a deduplication job from ~70 hours to ~10 hours by running it as an offline batch (conditions: that customer's dataset and pipeline — your mileage will differ) (Zilliz: {{S7}}). Lakebase is built on Milvus's serving engine, so the embeddings and index you produce stay in the same engine your online queries already use.
Related questions
- How do you change your embedding model without re-indexing everything?
- Why is my serverless vector database so expensive?
- Always-on vs serverless vs on-demand vector search
- Vector Lakebase
In short. Budget a large re-embed as compute (API tokens or GPU hours) plus minor storage — verify the per-token price against the provider's current docs, since they move. The biggest controllable cost is idle infrastructure: pay for active batch minutes, not standby hours. See more in {{HUB2}}.


