Vector Database Visualization: Feder, A Powerful Tool for Similarity Search
With the help of machine learning (ML) models, we can easily encode unstructured data like photos and videos into embeddings for vector similarity search. To accelerate the search, various indexes like IVF_FLAT and HNSW are adopted. To select an index best suited for the application, users need to trade off between search speed and accuracy.
To save the trouble for users, we are proud to announce Feder, a tool for visualizing ANNS algorithms. With Feder, users can understand different types of indexes and their parameters in a way that is unprecedentedly straight-forward. Feder also aids in visualizing data from advanced vector databases, such as Pinecone, making it easier to manage large volumes of data.
Feder enables users to observe how different indexes are structured, how data are organized using each type of index, and how different parameter configuration influences the indexing structure. In addition, Feder also helps visualize the whole process of vector similarity search and provides a detailed record of data access during the search. It is particularly useful for analyzing and visualizing complex data, helping users identify trends and patterns to make informed decisions.
Currently, Feder only supports the HNSW from hnswlib. More indexes will be supported soon.
Introduction to Vector Databases
Vector databases are a specialized type of database designed to efficiently store, manage, and query large volumes of vector data. Vector data represents complex information—such as images, text, or audio—as numerical vectors in a high-dimensional vector space. These databases are optimized for similarity search, enabling users to find the most similar vectors to a given query vector. This capability is particularly valuable in applications like image recognition, natural language processing, and recommendation systems, where understanding the nuances of data is crucial.
Understanding Feder in Vector Databases
Feder is built with JavaScript. To use Feder for visualization, you need to first build an index and save the index file from Faiss or Hnswlib. Feder can help in preprocessing and data cleaning before visualization. Then Feder analyzes the uploaded file to obtain index information and gets ready for the visualization. Feder also manages and visualizes the quality of data stored in vector databases. During a vector similarity search, you need to provide a target vector and the configuration of search parameters. Then Feder visualizes the whole search process for you, including the transformation of data into numerical vectors using an embedding model.
federjs consists of two parts:
- Feder-Core
- Analyzes index files to obtain detailed information about indexes.
- Supports querying indexes, and keeps a detailed record of the vectors accessed during an index query.
- Feder-View
- Enables visualization of the overall structure of different indexes.
- Enables visualization of the whole similarity search process with different indexes.
In addition to federjs, Feder also provides federpy, a Python tool. With federpy, you can directly visualize index structure and search process under IPython Notebook. Or you can alternatively export the visualization to an HTML file and then use a browser to start the web service.
Learn more about how to use Feder by reading Feder user guide.
In this use case, we use VOC 2012, the classic ML image dataset that contains more than 17,000 images.
First, we use Towhee, an open-source ML pipeline to encodes the images in the VOC 2012 dataset into vectors. Then we build an index with Hnswlib and save the index file. Finally, use Feder for visualization. Feder optimizes the search process without needing to search through the entire dataset.
The link here provides an interactive user experience for you to see the visualization for HNSW.
AN HNSW index is multi-layered and each layer is an interconnected network. The bottom layer captures all data objects in the database, and the data points/nodes become more sparse as it moves up to the uppermost layer. Let’s draw an analogy to our modern transportation system. If you are visiting from San Francisco to a boutique shop tucked in the Upper East Side of the New York City, you probably first take a flight to JFK or LaGuardia where you find the most convenient metro to take you to Manhattan, and then you probably switch to a bus or even a Citi bike to get to that neighborhood. Similarly, if we want to quickly find the nearest node to your target, we will first start from searching in the uppermost layer because the search here is faster. However, one shortcoming is that, more often than not, the upper layers and networks cannot take us to the desired destination or help us find the expected results. Therefore, we turn to the next layer beneath for higher accuracy.
When building an HNSW index, one node in the uppermost layer will be selected by algorithm as the entry point to start the search. Feder uses parallel processing to enhance query performance.
Below it shows the visualization of Layer 4, 3, and 2 in a five-layered HNSW index built on the VOC 2012 dataset.
Feder provides an interactive user experience. Therefore, you can choose any node to take a closer observation. The path highlighted in yellow represents the shortest path with the least transit nodes from the entrance to reach the node you choose. The paths in white shows all other nodes your chosen node can reach. By zooming in, you can see more details and you will realize that the more the layers, the more similar the connected objects. Feder also transforms raw data into visual formats.
You can view relevant statistics in the overview panel on the upper-left side. The parameter M decides how many of other nodes the chosen node can reach in each layer. As we can see from the screenshot, m= 8. This means that starting from any random node, the maximum number of nodes this random node can reach is 8.
We can modify the value of the parameters to observe how the index structure is affected.
As the value of M increases, the HNSW structure becomes flatter. The outcome of modifying the value of ef is less obvious in the visualization. In fact the parameter ef influences the generated links during index building.
After you upload a target image for search, Feder will display the whole search process with animation.
The animation that visualizes the whole process of vector similarity search.
The visualization displays a record of the data accessed in a vector similarity search. That is to say, you can see all the vectors that have been compared in terms of their distance to the target vector, while those not involved in this process are not displayed in the animation.
As we can see in the visualization, for HNSW indexes, the search starts from the uppermost layer, finds the closest node to the target in this layer, and then goes down to the next layer if all nodes accessible in this layer are not close enough to the target.
It should be noted that the search in the bottom layer proceeds in multiple paths. The parameter ef decides the choice of search path. For a detailed introduction to HNSW, read the paper "Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs".
Through the interactive visualization, we can see that the nodes at the beginning of the search path are less relevant. But as the search for the nearest neighbor proceeds, search accuracy rapidly rises. The statistics panel on the left demonstrates that only about 1% of the images (around 170 images) of a total of 17,000 images in the VOC 2012 dataset are actually accessed during the search. The tremendous acceleration in the search is made possible thanks to HNSW index.
You can also set different values for the index parameters and generate new index files to compare the structure and search efficiency. Feder visualizes the use of vector embeddings in similarity searches.
- Try Attu to manage your vector database with one-click simplicity
Key Features of Feder
Feder is a powerful tool that offers several key features, making it an ideal choice for managing and querying large amounts of vector data:
Advanced Vector Search: Feder supports sophisticated vector search algorithms, including cosine similarity and Euclidean distance, allowing users to find the most similar vectors to a given query vector with high precision.
Scalability: Designed to scale horizontally, Feder can handle vast amounts of data and high query volumes effortlessly, ensuring robust performance even as your dataset grows.
Data Management: Feder provides a comprehensive data management system, enabling users to easily manage, update, and maintain their vector data, ensuring data integrity and accessibility.
Semantic Search: With support for semantic search, Feder allows users to search for vectors based on their meaning and context, enhancing the relevance and accuracy of search results.
Use Cases for Feder
Feder’s versatility makes it suitable for a wide range of applications, including:
Image Recognition: Feder can be used to develop image recognition systems capable of identifying objects, people, and scenes in images, making it invaluable for applications in security, retail, and more.
Natural Language Processing: Feder can power natural language processing systems that understand and generate human language, facilitating applications like chatbots, translation services, and sentiment analysis.
Recommendation Systems: Feder can be employed to build recommendation systems that suggest products, services, or content to users based on their preferences and behavior, enhancing user experience and engagement.
Integrating Feder with Vector Databases
Feder can be seamlessly integrated with other vector databases to create a more comprehensive and robust vector data management system. Integration methods include:
Data Import/Export: Feder can import and export data from other vector databases, allowing users to easily transfer data between systems and maintain consistency across platforms.
API Integration: Feder offers a robust API, enabling developers to integrate it with other vector databases and applications, facilitating smooth interoperability and extended functionality.
Query Federation: Feder can federate queries across multiple vector databases, enabling users to search for vectors across different systems, thereby enhancing the scope and depth of their data analysis.
By leveraging these integration capabilities, users can enhance their data management and query performance, making more informed, data-driven decisions.
- Introduction to Vector Databases
- Understanding Feder in Vector Databases
- Key Features of Feder
- Use Cases for Feder
- Integrating Feder with Vector Databases
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading
- Read Now
Optimizing Multi-agent Systems with Mistral Large, Mistral Nemo, and Llama-agents
Agents can handle complex tasks with minimal human intervention. Learn how to build such agents with Mistral Large, Nemo, Llama agents, and Milvus.
- Read Now
Building a Multilingual RAG with Milvus, LangChain, and OpenAI LLM
Multilingual RAG expands the capabilities of traditional RAG to support multiple languages. Learn how to build a multilingual RAG with Milvus, LangChain, and OpenAI.
- Read Now
Leveraging Milvus and Friendli Serverless Endpoints for Advanced RAG and Multi-Modal Queries
This tutorial has demonstrated how to leverage Milvus and Friendli Serverless Endpoints to implement advanced RAG and multi-modal queries.