Bosch Achieves 80% Cost Reduction and Optimized Search Efficiency with Milvus
80%
Reduction in Data Collection Costs
~$1.4M
Reduction in Annual Storage Costs
Millisecond-level
Retrieval for Billions of Data Points with a Scalable Architecture
When we identify a need for specific data, we can often find the required data in our database the same day using text or image search with Milvus. This greatly improves our data processing efficiency and has a positive effect on our business operations.
Gong Zhang
About BOSCH
Headquartered in Germany, BOSCH is a global leader in automotive technologies and components, celebrated for its pioneering innovations and long-standing expertise in autonomous driving. They provide cutting-edge autonomous driving solutions, including advanced driver assistance systems (ADAS) like adaptive cruise control, lane-keeping assistance, and automated parking systems, which are trusted by leading automotive manufacturers worldwide.
The Challenge: Acquiring Image Datasets for Corner Cases
In autonomous driving, "corner cases" refer to rare, unexpected, or extreme situations such as sudden dense fog, heavy rain, snowstorms, or unexpected obstacles like pedestrians, animals, or unconventional vehicles. These situations pose significant challenges to the perception systems of autonomous vehicles, including radar, cameras, and LiDAR.
Automotive engineers must ensure that autonomous driving systems can safely and reliably navigate these edge cases. However, acquiring image datasets accurately representing these complex situations is problematic because such cases don't occur frequently and often require specialized conditions or environments to reproduce. Collecting this "corner case" image dataset with traditional data collection methods is both time-consuming and expensive, posing a significant obstacle for developers aiming to enhance the safety and reliability of autonomous vehicles.
BOSCH’s Intelligent Drive Control team encountered this exact challenge. They needed to find a way to efficiently and cost-effectively gather image datasets that could accurately depict these difficult situations. Without such data, it would be impossible to thoroughly test and refine the autonomous systems to perform safely under all conditions.
Exploring AI Solutions: Integrating LLMs and Vector Databases
To tackle its challenges, BOSCH’s Intelligent Drive Control team has explored various strategies over the years.
Initially, the team collected data for corner cases manually. This approach required a large fleet of vehicles and significant manpower to wait for these rare scenarios and gather data. It was time-consuming, inefficient, and reliant on chance encounters with the desired conditions, leading to lengthy project timelines.
Next, the team turned to knowledge graphs (KGs) to label data points with specific attributes or classifications. While this approach made organizing, retrieving, and analyzing data easier, the sheer variety of corner cases made it an enormous task to label each one uniquely.
Both methods had drawbacks, including high costs, low efficiency, and limited coverage.
With advancements in AI technologies, particularly large language models (LLMs) like ChatGPT, vector databases, and retrieval augmented generation (RAG), BOSCH began to explore more efficient solutions to tackle its challenges. They leveraged large vision models (LVMs) and large multimodal models (LMMs) to convert collected images into vector embeddings. By using a vector database, they could perform highly efficient text-to-image or image-to-image searches.
The team quickly identified suitable LMM and LVM models for image embedding. However, the real challenge was scaling vector similarity search, making vector databases a crucial component of this innovative solution.
The Journey to Choosing Milvus as the Similarity Search Solution
BOSCH relies on pre-trained AI models with billions of parameters and feature dimensions exceeding 1,000. For instance, with a 1,024-dimensional feature vector, each floating-point value (4 bytes) requires about 4KB of memory. When dealing with massive datasets, this storage requirement can lead to enormous resource consumption, driving up both storage and computational costs.
The volume of BOSCH’s image data is immense—currently in the tens of billions and still growing. After clustering and deduplication, the data needed for similarity retrieval in a vector database still numbers in the billions.
To address this challenge, BOSCH implemented quantization indexing and sharding technologies to minimize resource use and enhance data processing efficiency. Quantization indexing is ideal for efficiently storing large-scale data and indexing high-dimensional features. Sharding handles growing data volumes, making large-scale real-time retrieval possible and optimizing computational resource use. The team explored several approaches:
HNSW (Hierarchical Navigable Small Worlds) graph indexing: Many question-answering systems use HNSW graph indexing for natural language processing (NLP) tasks. Although it is a popular and straightforward method, HNSW requires storing high-dimensional features directly in the algorithm’s library, leading to high resource consumption and costs.
Vector search plugins on top of traditional databases: Adding ****vector fields to traditional relational databases is one of the available vector search solutions. However, for quantization index algorithms, sharding updates necessitate retraining codebooks, which adds complexity. Consequently, traditional databases with vector search functionality usually support only HNSW indexing, which does not meet BOSCH’s large-scale vector data processing and retrieval needs.
Gong Zhang, BOSCH’s principal software engineer, explained, “We need an indexing technology that can handle complex search requirements and generative models, reduce training costs, improve update efficiency, and adapt flexibly to evolving data and query needs.”
A specialized vector database emerged as the best solution for BOSCH’s needs. After evaluating various options, BOSCH chose Milvus as its vector search solution.
The Results: 80% Cost Reduction and Optimized Search Efficiency
Milvus is an open-source vector database that can store, index, and retrieve billions of vectors in milliseconds. Even with BOSCH’s vast and expanding volumes of data, Milvus maintains super-high performance. Most importantly, Milvus’s quantization indexing technology significantly reduces storage and computational resource consumption, making it easier for BOSCH to manage large-scale datasets.
80% Reduction in Data Collection Costs
Milvus’s efficient similarity search capabilities allow BOSCH to retrieve 70%-80% of the needed corner case data from existing databases, cutting down on the need for new data collection. Furthermore, Milvus enables nearly instant retrieval if the required data is already in the database, greatly improving data mining efficiency.
Zhang explained, “When we identify a need for specific corner case data, we can often find the required data in our database the same day using text or image search with Milvus. This greatly improves our data processing efficiency and has a positive effect on our business operations.”
Almost $1.4M Reduction in Annual Storage Costs
Reducing the need for external data collection has also substantially lowered storage costs. Zhang added, “Relying solely on external data collection could cost nearly 1.4 million dollars annually.”
Optimized Search Efficiency
Milvus’s quantization indexing technology greatly reduces storage and computation resource consumption. BOSCH can now process data more flexibly and efficiently, overcoming the performance limitations of traditional databases. Milvus also offers segmented and sharded search methods, enhancing efficiency and addressing current challenges with large-scale and high-dimensional data.
Millisecond-Level Retrieval for Billions of Data Points with a Scalable Architecture
BOSCH’s autonomous driving business is cloud-based. Milvus’s cloud-native architecture simplifies its deployment and scaling. It provides excellent scalability, which is crucial for BOSCH’s billion-level data operations. When its dataset expands, the team just needs one click to scale the needed resources. Zhang mentioned, “Even with numerous concurrent searches, we didn’t notice any slowdown in search speed.”
Active Community Support
Milvus is one of the most popular, rapidly evolving, and mature open-source vector databases, with a large and active user and developer community worldwide. Zhang commented, “The Milvus community is very active. Whenever we had issues, we got prompt responses from the community.”
Future Plans: Explore the Hybrid Search Capability of Milvus
To ensure data diversity, thousands of sample images are needed. Currently, BOSCH prioritizes text-to-image searches, resorting to image-to-image searches when text results are not good enough. Milvus’s support for multi-vector columns and hybrid searches makes on-demand image-to-image searches more feasible. For example, combining weather images with cone images helps search for various weather conditions involving cones or overlaying triangular road signs with descriptive text to search for different warning functions. This is a direction BOSCH and Milvus will continue to explore together.
Unlocking the Full Potential of Milvus in Autonomous Driving
Milvus isn't just a tool—it's a strategic ally for BOSCH in the autonomous driving arena. With Milvus, BOSCH can dive deeper into data and harness its power, giving them a crucial edge in the pursuit of smarter, safer driving. The adoption of Milvus has transformed how BOSCH handles data, making every step—from collection to processing to application—more efficient and accurate.
As BOSCH looks to the future, they're eager to explore more of Milvus's cutting-edge capabilities, driving forward the next generation of safer, smarter, and more convenient driving experiences.
- About BOSCH
- The Challenge: Acquiring Image Datasets for Corner Cases
- Exploring AI Solutions: Integrating LLMs and Vector Databases
- The Journey to Choosing Milvus as the Similarity Search Solution
- The Results: 80% Cost Reduction and Optimized Search Efficiency
- Future Plans: Explore the Hybrid Search Capability of Milvus
- Unlocking the Full Potential of Milvus in Autonomous Driving
Content
Industry
Automotive
Even with numerous concurrent searches, we didn’t notice any slowdown in search speed with Milvus.
Gong Zhang