Memory access patterns and cache misses significantly impact the performance of vector search algorithms by dictating how efficiently the CPU can utilize cached data. When data is accessed sequentially (spatial locality), the CPU can prefetch contiguous memory blocks into cache lines, reducing latency. Conversely, random access patterns (common in graph-based indexes) disrupt spatial locality, leading to frequent cache misses as the CPU fetches scattered data from slower main memory. Each cache miss introduces latency proportional to the distance from the CPU (e.g., L1 vs. RAM), directly slowing down query processing. Throughput—the number of queries handled per second—is also affected because high cache miss rates force the CPU to wait for data, underutilizing compute resources.
Graph-based indexes like HNSW (Hierarchical Navigable Small World) and flat indexes (e.g., brute-force or IVF) exhibit distinct memory behaviors. HNSW traverses nodes by hopping between neighbors, which often involves random jumps in memory. This leads to poor cache line utilization, as each node access may only use a fraction of the fetched cache line (e.g., loading a 64-byte cache line to read a 16-byte pointer). In contrast, flat indexes like IVF partition data into clusters, allowing sequential scans of contiguous vectors during search. These scans exploit spatial locality, minimizing cache misses. For example, IVF might load a cache line containing multiple vectors, all relevant to the current cluster comparison. However, graph-based methods compensate for their cache inefficiency by requiring fewer distance computations overall (e.g., HNSW skips non-promising regions via graph hops), while flat indexes trade higher computational costs for predictable memory access.
The trade-offs depend on data size and hardware. For smaller datasets, flat indexes may outperform graph-based ones due to fewer cache misses, as entire datasets fit in higher cache levels. For larger datasets, graph-based indexes like HNSW often achieve lower latency despite higher cache miss rates because their logarithmic search complexity reduces total operations. However, throughput can favor flat indexes when batched queries are processed, as sequential access allows vectorized operations and better prefetching. Optimizations like reordering memory layouts (e.g., grouping graph node data to improve locality) or using SIMD instructions in flat scans can further tilt the balance. Ultimately, the choice hinges on whether the algorithm’s computational savings (graph) outweigh the memory subsystem penalties.