DiskANN is an algorithm designed for approximate nearest neighbor (ANN) search on datasets too large to fit in memory. It addresses the memory limitation by combining a graph-based index structure with a disk-aware storage strategy. Unlike in-memory ANN methods like FAISS or HNSW, which require the entire dataset and index to reside in RAM, DiskANN stores most of the index on disk while keeping a small, optimized portion in memory for efficient navigation. This enables handling billion-scale datasets with limited RAM, trading off some query speed for reduced memory overhead.
The algorithm works by constructing a hierarchical graph where nodes represent data points and edges connect each node to its approximate nearest neighbors. The key innovation is separating the index into two layers: a compact in-memory "navigation" graph and a larger on-disk index containing detailed vector data. During a query, the in-memory graph guides the search to relevant regions, after which the algorithm retrieves specific vector data from disk for distance calculations. For example, a search might traverse the in-memory graph to identify 50 candidate nodes, then read their full-precision vectors from disk to refine results. To minimize disk I/O, DiskANN optimizes data layout—clustering related vectors in contiguous disk blocks—so retrieving a single block provides multiple candidates, reducing seek operations.
DiskANN achieves practical performance through several optimizations. First, it uses vector compression (e.g., product quantization) to reduce the size of the in-memory graph. Second, it batches disk reads to amortize latency, fetching multiple candidate vectors in a single I/O operation. Third, it employs caching for frequently accessed disk blocks. For instance, a 1-billion-vector dataset might require 400GB of disk space but only 2-4GB of RAM for the navigation graph. While queries are slower than purely in-memory methods (milliseconds vs. microseconds), DiskANN maintains acceptable latency (tens of milliseconds) for large-scale applications like recommendation systems, where trading speed for scalability is necessary.