A Hierarchical Navigable Small World (HNSW) graph index is a data structure designed for efficient approximate nearest neighbor (ANN) searches in high-dimensional vector spaces. It organizes vectors into a layered graph, where each layer represents a simplified version of the data, with higher layers containing fewer nodes and sparser connections. This hierarchy allows the algorithm to quickly narrow down the search area by starting at the top layer and progressively refining results in lower layers. For example, imagine searching for a book in a library: instead of checking every shelf, you first identify the correct section (top layer), then the aisle (middle layer), and finally the exact shelf (bottom layer). HNSW mimics this approach, reducing the number of distance computations required compared to brute-force or flat index methods.
The construction of an HNSW graph involves probabilistic insertion and small-world properties. When adding a vector, it is inserted into the bottom layer and randomly assigned to higher layers with exponentially decreasing probability. This creates a hierarchy where higher layers have fewer nodes and long-range connections, enabling fast traversal across the graph. During a search, the algorithm starts at the top layer, greedily moves toward the query vector's nearest neighbors, and uses these as entry points for the next layer. This "zoom out, then zoom in" process leverages the hierarchy to avoid exhaustive searches in dense lower layers. Parameters like M
(maximum connections per node) and efSearch
(search depth) balance speed and accuracy—higher values improve recall but increase computational cost.
HNSW achieves O(log n) search time complexity, outperforming alternatives like Navigable Small World (NSW) graphs. It is widely used in vector databases (e.g., FAISS, Milvus) and applications like recommendation systems or image retrieval. Key considerations include tuning parameters for specific datasets and balancing memory usage. For instance, in a real-time recommendation system, HNSW enables quick retrieval of similar user embeddings while tolerating minor accuracy trade-offs. Its efficiency and scalability make it a popular choice for high-dimensional ANN tasks where exact results are impractical.