Feder: A Powerful Visualization Tool for Vector Similarity Search
By Min Tian, transcreated by Angela Ni. on May 06, 2022
With the help of machine learning (ML) models, we can easily encode unstructured data like photos and videos into embeddings for vector similarity search. To accelerate the search, various indexes like IVF_FLAT and HNSW are adopted. To select an index best suited for the application, users need to trade off between search speed and accuracy.
To save the trouble for users, we are proud to announce Feder, a tool for visualizing ANNS algorithms. With Feder, users can understand different types of indexes and their parameters in a way that is unprecedentedly straight-forward.
Feder enables users to observe how different indexes are structured, how data are organized using each type of index, and how different parameter configuration influences the indexing structure. In addition, Feder also helps visualize the whole process of vector similarity search and provides a detailed record of data access during the search.
Currently, Feder only supports the HNSW from hnswlib. More indexes will be supported soon.
federjs consists of two parts:
- Analyzes index files to obtain detailed information about indexes.
- Supports querying indexes, and keeps a detailed record of the vectors accessed during an index query.
- Enables visualization of the overall structure of different indexes.
- Enables visualization of the whole similarity search process with different indexes.
In addition to federjs, Feder also provides federpy, a Python tool. With federpy, you can directly visualize index structure and search process under IPython Notebook. Or you can alternatively export the visualization to an HTML file and then use a browser to start the web service.
Learn more about how to use Feder by reading Feder user guide.
A use case of visualizing HNSW index
In this use case, we use VOC 2012, the classic ML image dataset that contains more than 17,000 images.
First, we use Towhee, an open-source ML pipeline to encodes the images in the VOC 2012 dataset into vectors. Then we build an index with Hnswlib and save the index file. Finally, use Feder for visualization.
The link here provides an interactive user experience for you to see the visualization for HNSW.
Visualizing the HNSW index structure
AN HNSW index is multi-layered and each layer is an interconnected network. The bottom layer captures all data objects in the database, and the data points/nodes become more sparse as it moves up to the uppermost layer. Let’s draw an analogy to our modern transportation system. If you are visiting from San Francisco to a boutique shop tucked in the Upper East Side of the New York City, you probably first take a flight to JFK or LaGuardia where you find the most convenient metro to take you to Manhattan, and then you probably switch to a bus or even a Citi bike to get to that neighborhood. Similarly, if we want to quickly find the nearest node to your target, we will first start from searching in the uppermost layer because the search here is faster. However, one shortcoming is that, more often than not, the upper layers and networks cannot take us to the desired destination or help us find the expected results. Therefore, we turn to the next layer beneath for higher accuracy.
When building an HNSW index, one node in the uppermost layer will be selected by algorithm as the entry point to start the search.
Below it shows the visualization of Layer 4, 3, and 2 in a five-layered HNSW index built on the VOC 2012 dataset.
Feder provides an interactive user experience. Therefore, you can choose any node to take a closer observation. The path highlighted in yellow represents the shortest path with the least transit nodes from the entrance to reach the node you choose. The paths in white shows all other nodes your chosen node can reach. By zooming in, you can see more details and you will realize that the more the layers, the more similar the connected objects.
You can view relevant statistics in the overview panel on the upper-left side. The parameter M decides how many of other nodes the chosen node can reach in each layer. As we can see from the screenshot, m= 8. This means that starting from any random node, the maximum number of nodes this random node can reach is 8.
We can modify the value of the parameters to observe how the index structure is affected.
As the value of M increases, the HNSW structure becomes flatter. The outcome of modifying the value of ef is less obvious in the visualization. In fact the parameter ef influences the generated links during index building.
Visualizing search process with the HNSW index
After you upload a target image for search, Feder will display the whole search process with animation.
The animation that visualizes the whole process of vector similarity search.
The visualization displays a record of the data accessed in a vector similarity search. That is to say, you can see all the vectors that have been compared in terms of their distance to the target vector, while those not involved in this process are not displayed in the animation.
As we can see in the visualization, for HNSW indexes, the search starts from the uppermost layer, finds the closest node to the target in this layer, and then goes down to the next layer if all nodes accessible in this layer are not close enough to the target.
It should be noted that the search in the bottom layer proceeds in multiple paths. The parameter ef decides the choice of search path. For a detailed introduction to HNSW, read the paper "Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs".
Through the interactive visualization, we can see that the nodes at the beginning of the search path are less relevant. But as the search for the nearest neighbor proceeds, search accuracy rapidly rises. The statistics panel on the left demonstrates that only about 1% of the images (around 170 images) of a total of 17,000 images in the VOC 2012 dataset are actually accessed during the search. The tremendous acceleration in the search is made possible thanks to HNSW index.
You can also set different values for the index parameters and generate new index files to compare the structure and search efficiency.