Blog
Using Vector Search to Better Understand Computer Vision Data

Using Vector Search to Better Understand Computer Vision Data

Jun 11, 20246 min read

How bad can bad data be to your AI? See for yourself:

Data Quality Issues

Bad data can sabotage your AI-powered application and workflows, with some serious impact beyond just a frustrated user, but this is no surprise.

Many of us have envisioned leveraging the convenience and versatility of multimodal large language models (LLMs) to tap into the rich information in images and videos. Make wonders, take it to the next level, as the cliché goes. Computer vision offers that kind of boundless opportunity for new and more fulfilling services. But there are challenges along the way.

A critical one is how to curate better data to the right model for improved results. There is a lot of fine tuning, brute force trial and error in the dark because of complexity of models and high dimensionality of data, draining resources away from innovation.

What if we could bring transparency and clarity to visual AI workflows, making it fast and even fun? Voxel51 took it as a mission and delivered!

As demonstrated by Jacob Marks, a machine learning engineer and developer evangelist at Voxel51 at SF Unstructured Data Meetup.

Making Visual AI a Reality

The explosion of new apps and services powered by generative AI and machine learning has revealed the importance of harnessing unstructured data and the role of vector databases as a game-changer. Jacob Marks showed in his presentation how integrating vector databases with tools like Voxel51, and its FiftyOne open source project, is revolutionizing the exploration, visualization, and curation of visual data to build AI-powered applications more efficiently and more reliably. It allows you to test and assess models by feeding them exact datasets they need to ensure robust, accurate results.

All Starts with Data Quality

Why? Because better data leads to better models, accelerating the path to success.

“Nothing hinders the success of machine learning systems more than poor quality data," says Jacob. Preparing data and finding the right model can be time-consuming and inefficient without the right tools. Even skilled ML engineers need good tools to build high-quality datasets and models.

FiftyOne simplifies visual data handling, making it easier and faster to understand operations, adjustments, and results.

You can visualize complex labels, evaluate models, explore scenarios of interest, identify failure modes, and find annotation mistakes, among other tasks. This is achieved through LLM chains running in the background, generating embeddings, and querying a vector database.

Now, let’s cut to the chase!

From RAG to Riches: The Power of Vector Search

RAG is one of the reasons that made vector databases popular.

Retrieval-Augmented Generation (RAG) has popularized vector search by enhancing the accuracy of large language models. It combines retrieval-based and generative models to improve the quality and relevance of generated text.

This technique uses LLM to convert the user prompt to embeddings and comparing them to the vector embeddings, allowing semantic similarity search for more accurate and context-rich responses.

RAG RAG

You can compare multiple vectors for similarity to the data inputs. So if you take two text inputs, vectorize and embed them, you can look at their closeness, regardless of the metrics used, Euclidean distance, cosine similarity or the dot product.

Vector similarity

You can also have multimodal embeddings to handle different types of data together, like texts, images and videos in the type of space.

Vector embeddings

Vector Search for Computer Vision

Voxel51 integrated Milvus using Zilliz Cloud to unleash vector search capabilities on visual datasets. Here are some powerful use cases:

Image Similarity: Similarity search is a common use case and it is made even easier with Voxel51.

Just need to select the image you are interested in from your dataset and use it to search for similar ones. All the embedding and query steps are done in the back. Keeping the visual experience very intuitive and clickable. For instance, you can define the attributes such as the metric and k-value by selecting them in the GUI.

Image similarity search

As easily, you can also do a reverse search using an external image.

Let’s say that you want to find if you have a cane corso dog in your visual dataset. Just need to provide the URI for the image, it will be automatically vectorized and queried for similarity against the visual dataset in the vector space.

Reverse image search

Object search: Beyond whole images, vector databases can handle object detection patches, enabling more precise searches within sub-images. This is useful for tasks like facial recognition or identifying objects within large datasets.

Object similarity search

As the object focus of search is most likely not the entire image, computing embeddings for the entire image might be less effective because it will not not always be similar to embeddings of the object.

OCR Search: Another use case is interactive Optical Character Recognition (OCR) document. You can interact with the text embeddings visually. You can see where on each of the pages in your documents these results come from.

Robust OCR document search

Cross-modal Retrieval: Tools like OpenAI's CLIP and Meta's ImageBind allow for the combination of text and image embeddings. This enables cross-modal retrieval, where users can search for images using text descriptions embedding, audio embeddings, etc, or vice versa. In his example, an audio clip of a train was embedded then compared to all the images to find trains in the dataset.

Cross-modal retrieval

Perceptual Similarity: Perceptual similarity allows us to understand how different models perceive the world by comparing model representations in vector space. Some models are very semantic, capturing high level details and concepts but not the pallet of the image at the pixel level, as the picture below:

Probing perceptual similarity

More traditional computer vision implemented with computational neural networks get every pixels and patches, but they won't get any of the meaning right, as you can see in the image below.

Probing perceptual similarity-2

You can compare model representations in the vector space by seeing the results distribution in the vector space. Some models have the results clustered together, while others might not. They see the world differently and understanding when to apply these different perspectives is fundamental to the quality of your AI.

There is More Innovation Coming In Visual Vector Searching

Concept Interpolation: Concept interpolation takes two text concepts and interpolates findings between them. In the example, the initial embeddings for a husky and a chihuahua were given to search for anything that could fit between them, including a cat!

concept interpolation

Concept Space Traversal: With concept space traversal users can combine and manipulate embeddings to regulate attributes in the space of possible embeddings, like in the example, where healthiness and colorfulness attributes are regulated in the search.

Concept space traversal

A lot is happening in the back to make the search combining embeddings for text, images, other modalities with the correct attributes at the desired level to provide more dynamically explorable the search space, and all that is left for you to do is to click or slide to your selection. That easy!

Conclusion

Vector databases are indispensable in computer vision, offering a powerful engine to visual dataset tools for data exploration, model evaluation, and innovative search using multimodal embeddings, concept interpolation, and traversal. As AI continues to evolve, integrating vector databases will play a crucial role in shaping the future of unstructured data-driven technologies.

The sky is limited as Jacob says. Get hands-on and have fun with visual AI powered by a vector database with this handy tutorial.

If you wish to learn more or begin your computer vision project , for instance, welcome to join our Discord channel. We provide a wealth of resources and a supportive community to assist you in getting started.

Updated on Jul 01, 2025

Daniella Pontes

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

Multimodal Pipelines for AI Applications

Learn how to build scalable multimodal AI pipelines using Datavolo and Milvus. Discover best practices for handling unstructured data and implementing RAG systems.

Vector Databases vs. In-Memory Databases

Use a vector database for AI-powered similarity search; use an in-memory database for ultra-low latency and high-throughput data access.

Introducing IBM Data Prep Kit for Streamlined LLM Workflows

The Data Prep Kit (DPK) is an open-source toolkit by IBM Research designed to streamline unstructured data preparation for building AI applications.