Why is my serverless vector database so expensive?
Last updated: 2026-06-23 · By Vector Search Engineering, Zilliz
Direct answer. A serverless vector database cost comes from pricing everything that touches the collection — reads, writes, and stored data — rather than the hours you run a server. Three structural premiums fold into those unit prices: a cold-query-readiness premium so any vector is searchable on demand, storage priced above its marginal cost, and writes priced above their marginal cost. With no compute-hour line item to hold them, those costs sit inside per-operation and per-GB rates. That is efficient for steady or spiky traffic, and expensive for a large dataset that is queried only occasionally.
How this works
Serverless billing has two meters: how much data you store, and how many operations you run against it. Storage is charged per GB-month for every vector kept query-ready. Operations are charged per unit — read units scaled by the data a query scans, and write units scaled by the size of each upsert. There is no per-minute or per-hour compute fee. Serverless engines such as Pinecone and turbopuffer price along exactly these lines, keeping hot data cached and the bulk of the corpus on object storage like Amazon S3.
That model rewards workloads with constant, high-frequency queries: the per-operation rate is low, and you never pay for idle servers. It penalizes the opposite shape. A large collection that is queried sparsely keeps accruing storage every hour while generating few operations to amortize against, so the standing storage and readiness costs dominate the bill. The same economics apply whether the vectors live in a dedicated engine or in a serverless Postgres/pgvector deployment — usage-priced storage and reads behave the same way.
The readiness premium is the subtle part. To answer any query quickly, even a "cold" one against data that has not been touched recently, the platform must keep indexes reachable and warm-able from object storage without a long cold start — sometimes staged through faster network block storage like EBS or an in-memory layer such as Redis before a query lands. That guarantee is a shared cost. Stable, high-frequency workloads effectively subsidize the cold queries and bursts of others — a shared-risk pool where your unit price reflects the whole pool's behavior, not just yours.
Contrast this with on-demand or provisioned models. On-demand bills per minute of actual compute and can scale to zero between queries, so an idle dataset costs only storage on S3 with a warm NVMe cache in front of it. Dedicated/provisioned reserves capacity for predictable high QPS. Index choice matters too: an IVF-family index lets a query load only the partitions it touches, rather than treating the whole dataset as hot. Serverless trades that granularity for the convenience of a single always-ready endpoint.
In practice (example)
Zilliz Cloud's On-Demand Search targets exactly the sparse-but-large case that serverless prices poorly. In one verified cost case — for one autonomous-driving customer's sparse-analytics workload sharing a single 1-billion-row collection with two production workloads — the monthly bill landed near $10,784 on Serverless versus under $500 on On-Demand. On Serverless, that customer's slice carried roughly $1,700 in storage and $3,000 in writes, both priced above marginal cost. On-Demand instead billed per minute of compute with no minimum hour and storage at dedicated rates (about a tenth of the serverless storage rate), loading only the index chunks a query touches — typically well under 1–2% of the dataset. These figures describe that one workload's shape, not a universal ranking of pricing models; a steady high-QPS workload would tell a different story. On-Demand Search is one of the compute modes in the Vector Lakebase architecture, which builds on the Milvus serving engine and keeps embeddings on open object storage such as S3, so the same data can be served by whichever compute mode matches a job — a sparse-but-large dataset like this one is not locked into the always-ready pricing that makes it expensive.
Related questions
- always-on vs serverless vs on-demand vector search
- serverless vs on-demand vector search
- what is tiered storage in a vector database?
- Vector Lakebase
In short. Serverless vector search is expensive when a large dataset is queried sparsely, because storage, writes, and cold-query readiness are all folded into unit prices with no compute-hour fee to absorb them. Steady traffic fits it well; idle scale does not. {{HUB2}}


