Dataset Note:

For "Single tenant LAION 100M" below, we refer to the "Laion 100m" () Dataset (100M * 768 float dense vectors).
For "Multitenant Cohere 10M" Dataset below, we refer to the "Cohere Large" () Dataset (10M * 768 float dense vectors). All data are randomly splitted across 1000 tenants.

Cost Comparison

Pareto cost lines show how much sustained QPS each cloud vector database can deliver for a given operating spend. They combine measured search results with public pricing models.

Workload

Cost basis

QPS max

Cost max

Cost period

HourlyMonthly

Price order

AscendingDescending

Cost vs. QPS ParetoQuery only · lower is better · USD / hour

Notes:All tests are conducted in AWS us-west-2. All costs shown here are based on each product's charges in this region.

Continuous Ingestion And Search Freshness

This case measures the write-to-serve lifecycle: how long bulk data takes to finish inserting, when it can be searched reliably, and when background indexing has fully caught up.

Display cost

DatasetLAION 100M

InsertedSearchableFully IndexedWrite Cost

Zilliz Cloud Capacity 12CU

batch size = 1,000

2.9 hr

0 ms

7.0 min

$9.12

batch size = 5,000

3.1 hr

0 ms

1.8 min

$9.25

batch size = 10,000

3.2 hr

0 ms

1.9 min

$9.5

Zilliz Cloud Tiered 4CU

batch size = 1,000

4.1 hr

0 ms

10.3 min

$6.3

batch size = 5,000

4.1 hr

0 ms

9.6 min

$6.29

batch size = 10,000

4.1 hr

0 ms

10.9 min

$6.34

Turbopuffer

batch size = 1,000

53.5 hr

0 ms

3.4 min

$304

batch size = 5,000

1.9 hr

6.6 hr

2.4 min

$302

batch size = 10,000

1.8 hr

6.4 hr

2.0 min

$302

Pinecone Serverless

batch size = 1,000

111.7 hr

0 ms

42 ms

$1,180

batch size = 5,000

71.4 hr

0 ms

1 ms

$1,180

batch size = 10,000

72.4 hr

0 ms

127 ms

$1,180

Notes:For exactly how we defined "inserted", "searchable", "fully indexed", please check VectorDBBench source code for each client. For basic intuition:

Single & Multitenant Search With Payload

This case measures query behavior after cloud data is already loaded and searchable. It compares peak concurrent QPS, P99 latency, and recall under different response payloads, scalar-filter selectivity, and tenancy modes so the chart shows both throughput and result quality instead of rewarding speed alone.

Mode

Filter

Payload

Latency

Display cost

topK = 100

Vector Search Latency and QPSunfiltered · ids only · Max concurrency P99

ProductMax Concurrency P99 LatencyMax Concurrency QPSrecall@10

Zilliz Cloud Capacity 32CU

2,000 bytes/query

158 ms

786.1

recall@10 0.9728

Turbopuffer

2,000 bytes/query

2.34 s

395.7

recall@10 0.9321

Zilliz Cloud Capacity 12CU

2,000 bytes/query

299 ms

376

recall@10 0.9723

Turbopuffer Pinned

2,000 bytes/query

3.30 s

68.2

recall@10 0.9321

Zilliz Cloud Tiered 4CU

2,000 bytes/query

5.57 s

49.2

recall@10 0.9510

Pinecone Serverless

2,000 bytes/query

4.85 s

4.6

recall@10 0.9609

Cold Start Latency

This case measures the first query after an idle cold period against the warmed steady-state query path. It isolates cold-start behavior from normal search throughput so the chart shows whether a product has a material warm-up penalty after inactivity.

Mode

Cold / Warm Latencyunfiltered

Zilliz Cloud Capacity 12CU

55 / 54 ms

Turbopuffer Pinned

64 / 45 ms

Zilliz Cloud Tiered 4CU

122 / 57 ms

Pinecone Serverless

271 / 60 ms

Turbopuffer

2048 / 322 ms

Cold / Warm Ratiolower is better

Zilliz Cloud Capacity 12CU

1.01×

Turbopuffer Pinned

1.42×

Zilliz Cloud Tiered 4CU

2.16×

Pinecone Serverless

4.52×

Turbopuffer

6.36×

Notes:

We note that while certain products may have a more dramatic cold/warm ratio at p99 percentile, this usually indicates a network shaking problem in later queries and cannot be fully reproduced. Thus we stick with the more faithful definition of cold/warm latency, i.e. the first query for each round.
The timing for when a product's collection becomes cold is rather ambiguous since most products don't offer public APIs to provide such info. In order to simulate real world production settings, for cold latency benchmarking, we ensure to wait at least 24 hours since the last operations on the products for the collections to become as cold as possible.