How Milvus Transformed BIGO's Video Deduplication System for Optimal Throughput and User Experience
<200ms
search response time with a high recall rate
>700 million
embedding vectors indexing and management
Significantly increased
query throughput without compromising performance
Milvus has done an extraordinary job in revolutionizing Likee's video deduplication system, which significantly fueled the growth of BIGO's short-video business.
Xinyang Guo
About BIGO
BIGO Technology (BIGO) is a rapidly expanding tech company based in Singapore with over 30 offices and six R&D centers worldwide. Powered by Artificial Intelligence technologies, BIGO offers video-based products and services such as Bigo Live for live streaming and Likee for short video sharing and has become hugely popular with over 400 million users across 150 countries.
Challenges: Removing Massive Amounts of Duplicate Videos
Likee is an incredible global platform allowing users to express themselves and share their moments through short videos. However, with tens of millions of users generating videos daily, Likee faces a significant challenge in improving user experience and recommending high-quality content. One of the biggest challenges Likee must overcome is the sheer amount of duplicate videos uploaded to the platform.
To tackle this issue, Likee needs a solution that detects and removes duplicate videos promptly and efficiently. Such a process is complicated and demands a comprehensive understanding of each video's distinct characteristics and the ability to compare and contrast them swiftly.
Previously, Likee utilized Faiss, a library for similarity search and clustering of dense vectors. However, Faiss struggled to manage massive amounts of vectors and had slow query response and limited query throughput. So, the Likee team urgently needed a more efficient solution for similarity search and detection.
Solution: Empowering Video Similarity Search with Milvus
Milvus is an open-source vector database purpose-built to store, index, and query embedding vectors, featuring lightning-fast similarity search. With Milvus, Likee's engineering team created a more efficient deduplication system to perform searches under 200ms while maintaining a high recall rate. Likee also benefited from Milvus' scalability, resulting in improved throughput of vector queries and increased working efficiency.
How Likee identify duplicate videos
Likee’s deduplication system cuts every newly uploaded video into 15-20 frames and converts each to a feature vector. Then, the system searches the top k
most similar vectors from a database that stores over 700 million vectors corresponding to all existing videos. Then, the system determines which tapes are duplicates and need removing.
The diagram below illustrates the structure of Likee's deduplication system. First, new videos are stored in Kafka, a data storage system, and consumed by Kafka consumers. The system then uses deep learning models to convert the videos into embeddings and send them to the similarity auditor. Before being loaded for further searches, the embeddings are indexed by Milvus and stored in Ceph. Finally, the system stores the video IDs that correspond to those embeddings in TiDB or Pika, two relational databases.
The architecture of Likee's deduplication system
How Milvus empowers Likee’s similarity search
The diagram below illustrates the steps involved in a similarity search procedure.
- To conduct a video similarity search, Milvus first performs a batch search to recall the top 100 vectors similar to each feature vector extracted from a new video. Each similar vector is associated with its corresponding video ID.
- Next, Milvus removes duplicate videos by comparing the video IDs and retrieving the remaining videos' feature vectors from TiDB or Pika.
- Finally, Milvus calculates and scores the similarity between the retrieved feature vectors and those of the query video. The video ID with the highest score is returned as a result.
How Milvus helps Likee’s similarity search
Results: Improved Query Throughput with Faster Search Response
Milvus, a high-performance vector search engine, has played a vital role in Likee's video deduplication system, significantly improving user experience and the growth of BIGO's short-video business. Using Milvus, Likee can complete a search in less than 200ms, ensuring a high recall rate. Milvus is also horizontally scalable, enabling Likee to significantly increase vector query throughput while enhancing the system's efficiency without compromising performance.
In addition to video deduplication, Bigo has plans to use Milvus for more video-related purposes, such as sentiment analysis, object recognition, and personalized video recommendation. BIGO and Milvus are excited to expand their cooperation in these areas and beyond.
We plan to expand the use of Milvus in different fields like content moderation and restriction and customized video services. BIGO and Milvus working together will benefit both businesses and I look forward to Milvus and its community to keep growing and prosper.
Xinyang Guo