For "Single tenant LAION 100M" below, we refer to the "Laion 100m" (Click to copy S3 URI) Dataset (100M * 768 float dense vectors).
For "Multitenant Cohere 10M" Dataset below, we refer to the "Cohere Large" (Click to copy S3 URI) Dataset (10M * 768 float dense vectors). All data are randomly splitted across 1000 tenants.
Cost Pareto Lines
Pareto cost lines show how much sustained QPS each cloud vector database can deliver for a given operating spend. They combine measured search results with public pricing models.
Workload
Single tenant LAION 100M
Cost basis
Query only
Cost period
Price order
Cost vs. QPS ParetoQuery only · lower is better · USD / hour
Notes:All tests are conducted in AWS us-west-2. All costs shown here are based on each product's charges in this region.
Cloud Insert Case
This case measures the write-to-serve lifecycle: how long bulk data takes to finish inserting, when it can be searched reliably, and when background indexing has fully caught up.
Display cost
DatasetLAION 100M
InsertedSearchableFully IndexedWrite Cost
Zilliz Cloud Capacity 12CU
batch size = 1,000
2.9 hr
0 ms
7.0 min
$9.12
batch size = 5,000
3.1 hr
0 ms
1.8 min
$9.25
batch size = 10,000
3.2 hr
0 ms
1.9 min
$9.5
Zilliz Cloud Tiered 4CU
batch size = 1,000
4.1 hr
0 ms
10.3 min
$6.3
batch size = 5,000
4.1 hr
0 ms
9.6 min
$6.29
batch size = 10,000
4.1 hr
0 ms
10.9 min
$6.34
Turbopuffer
batch size = 1,000
53.5 hr
0 ms
3.4 min
$304
batch size = 5,000
1.9 hr
6.6 hr
2.4 min
$302
batch size = 10,000
1.8 hr
6.4 hr
2.0 min
$302
Pinecone Serverless
batch size = 1,000
111.7 hr
0 ms
42 ms
$1,180
batch size = 5,000
71.4 hr
0 ms
1 ms
$1,180
batch size = 10,000
72.4 hr
0 ms
127 ms
$1,180
Notes:For exactly how we defined "inserted", "searchable", "fully indexed", please check VectorDBBench source code for each client. For basic intuition:
Cloud Payload Search Case & Multitenant Search Case
This case measures query behavior after cloud data is already loaded and searchable. It compares peak concurrent QPS, P99 latency, and recall under different response payloads, scalar-filter selectivity, and tenancy modes so the chart shows both throughput and result quality instead of rewarding speed alone.
Mode
Single tenant LAION 100M
Filter
unfiltered
Payload
ids only
Latency
Max concurrency P99 latency
Display cost
topK = 100
Vector Search Latency and QPSunfiltered · ids only · Max concurrency P99
ProductMax Concurrency P99 LatencyMax Concurrency QPSrecall@10Query cost @ max QPS
Zilliz Cloud Capacity 32CU
2,000 bytes/query
158 ms
786.1
recall@10 0.9728
n/a
Turbopuffer
2,000 bytes/query
2.34 s
395.7
recall@10 0.9321
n/a
Zilliz Cloud Capacity 12CU
2,000 bytes/query
299 ms
376
recall@10 0.9723
n/a
Turbopuffer Pinned
2,000 bytes/query
3.30 s
68.2
recall@10 0.9321
n/a
Zilliz Cloud Tiered 4CU
2,000 bytes/query
5.57 s
49.2
recall@10 0.9510
n/a
Pinecone Serverless
2,000 bytes/query
4.85 s
4.6
recall@10 0.9609
n/a
Cloud Cold Latency Case
This case measures the first query after an idle cold period against the warmed steady-state query path. It isolates cold-start behavior from normal search throughput so the chart shows whether a product has a material warm-up penalty after inactivity.
Mode
unfiltered
Cold / Warm Latencyunfiltered
Zilliz Cloud Capacity 12CU
55 / 54 ms
Turbopuffer Pinned
64 / 45 ms
Zilliz Cloud Tiered 4CU
122 / 57 ms
Pinecone Serverless
271 / 60 ms
Turbopuffer
2048 / 322 ms
Cold / Warm Ratiolower is better
Zilliz Cloud Capacity 12CU
1.01×
Turbopuffer Pinned
1.42×
Zilliz Cloud Tiered 4CU
2.16×
Pinecone Serverless
4.52×
Turbopuffer
6.36×
Notes:
We note that while certain products may have a more dramatic cold/warm ratio at p99 percentile, this usually indicates a network shaking problem in later queries and cannot be fully reproduced. Thus we stick with the more faithful definition of cold/warm latency, i.e. the first query for each round.
The timing for when a product's collection becomes cold is rather ambiguous since most products don't offer public APIs to provide such info. In order to simulate real world production settings, for cold latency benchmarking, we ensure to wait at least 24 hours since the last operations on the products for the collections to become as cold as possible.