As the number of vectors grows from 1 million to 1 billion, index build time and query performance degrade, but the rate of degradation depends on the indexing algorithm and hardware constraints. For most approximate nearest neighbor (ANN) methods like HNSW, IVF, or LSH, build time scales superlinearly (worse than linear), while query latency increases sublinearly if parameters are tuned appropriately.
Index Build Time Building an index for 1 billion vectors typically takes significantly longer than scaling linearly. For example, HNSW constructs a hierarchical graph where each insertion has a time complexity of roughly O(log n), leading to an overall build time of O(n log n). Doubling the dataset from 500M to 1B vectors would thus increase build time by more than 2x. IVF-based methods cluster vectors into buckets, which requires iterative clustering steps (e.g., k-means) that scale with O(nk), where k is the number of clusters. If k grows with the dataset (e.g., from 1k to 10k clusters for 1B vectors), build time becomes superlinear. Distributed systems or GPU acceleration can mitigate this, but single-node builds often face memory and compute bottlenecks.
Query Performance Query latency tends to scale sublinearly if the indexing strategy adapts to larger datasets. For HNSW, search time grows logarithmically (O(log n)) because the graph layers enable efficient traversal even at scale. With 1B vectors, a well-tuned HNSW index might see query times 2-3x slower than at 1M vectors, assuming fixed parameters. IVF-based methods probe a fixed fraction of clusters (e.g., 10% of 10k clusters), so latency remains roughly constant if the cluster count scales with n. However, maintaining recall often requires increasing the number of probes as n grows, which can push query latency closer to linear scaling. For example, probing 100 clusters instead of 10 for 1B vectors would linearly increase compute per query.
Scaling Behaviors In practice, build time often scales between O(n log n) and O(n²) for exact methods, while ANN algorithms like HNSW or DiskANN achieve sublinear query scaling (O(log n)) with careful parameterization. However, hardware limitations (e.g., memory bandwidth, cache size) can impose practical bottlenecks. For instance, searching 1B vectors on a single machine might require 100GB+ of RAM, forcing distributed solutions that introduce network latency. Observed scaling is rarely perfectly linear: ANN methods trade off accuracy for speed, so the relationship between n and performance depends heavily on how much recall degradation is acceptable. Tools like FAISS or Milvus mitigate this by allowing configurable trade-offs, but developers should expect diminishing returns in query speed as n exceeds 100M without infrastructure optimizations.