How Tokopedia Achieved a 10x Smarter Search Experience Using Milvus
scalability and reliability
Our search system has been much more intelligent, stable, and reliable using Milvus.
Tokopedia is Indonesia's largest e-commerce platform, boasting a staggering 90 million monthly active users and an impressive network of 8.6 million merchants. With a reach extending to 98% of Indonesia's administrative regions, Tokopedia has become the country's go-to destination for online shopping.
Tokopedia recognizes its extensive product catalog's value lies in ensuring buyers can effortlessly discover products tailored to their preferences. In their unwavering commitment to enhancing the relevance of search results, they introduced a similarity search on Tokopedia.
When the user navigates to the search results page on your mobile device, they will notice a discreet "…" button. By clicking this button, The user can access a menu that offers the exciting opportunity to explore products that closely match the one the user is viewing.
Challenges with keyword-based search
In the past, Tokopedia Search utilized Elasticsearch as its primary engine for product search and ranking. Each search request initiated a query to Elasticsearch, which ranked products based on the user's search keyword. Elasticsearch stores the keywords as sequences of numerical values, representing ASCII or UTF codes for individual letters. It constructs an inverted index for the swift identification of documents containing words from the user's query and subsequently determines the best matches using a range of scoring algorithms.
However, these scoring algorithms don't usually consider the semantics of the keywords searched for. Instead, they focus on factors like how often the words appear in the documents, how close they are to each other, and other statistical information. Although humans can understand the meaning behind the ASCII representation of words, computers need a reliable algorithm to compare the semantics of ASCII-encoded words.
One of the solutions to the problem that the Tokopedia team found was to create a new way to represent keywords, which shows the letters in a word and gives information about its meaning. For example, they could encode the commonly used words with the search keyword to provide probable context. From there, they can assume that similar contexts indicate similar concepts and compare them using mathematical techniques. It is even possible to encode entire sentences based on their meaning.
Selecting Milvus as the vector similarity search engine
Now that Tokopedia possesses feature vectors, the remaining challenge lies in efficiently retrieving vectors from the extensive dataset that closely match the target vector. In exploring vector search engines, we conducted proof-of-concept (POC) evaluations on several vector search stacks available on GitHub, including FAISS, Vearch, and Milvus.
Our preference leans towards Milvus based on our load testing results. Compared to Milvus, FAISS operates more as an underlying library and is consequently less user-friendly. As we delved deeper into Milvus, we adopted it for the following reasons:
- Milvus proved remarkably user-friendly. They found that you only need to pull its Docker image and adjust the parameters to suit your specific use cases.
- Milvus offers a broader range of supported indexes. Besides FAISS, HSNW, DISK_ANN, and ScaNN, there are 11 indexes to choose from.
- Milvus provides comprehensive documentation to aid users in their implementation.
In a nutshell, Milvus is user-friendly, with clear documentation and reliable community support for any issues that may arise.
Milvus in production
After implementing Milvus as their feature vector search engine, they utilized it for their Ads service to match low-fill rate keywords with high-fill rate keywords. They configured and ran a standalone node in a development (DEV) environment, and it ran smoothly and delivered an impressive 10x higher click-through rate (CTR) and conversion rate (CVR).
However, a potential concern arose. If a standalone node crashed, it would render the entire service inaccessible. Hence, the Tokopedia team switched to a HA implementation of Milvus.
Milvus offers two tools: Mishards, a cluster sharding middleware, and Milvus-Helm for streamlined configuration. At Tokopedia, they use Ansible playbooks for infrastructure setup, prompting them to create a playbook to orchestrate the infrastructure. The diagram below shows how Mishards works.
Mishards facilitates the seamless flow of requests from upstream to downstream, dividing the upstream requests into sub-modules, gathering results from sub-services, and subsequently delivering these results back to the upstream source.
The architecture of the Mishards-based cluster solution is shown below.
The Tokopedia semantic search service system includes one writable node, two read-only nodes, and one Mishards middleware instance, all deployed in GCP using Milvus Ansible. The system has been considerably smarter, stable, and reliable.
How does vector indexing accelerate similarity search?
Efficiently querying large vector datasets in similarity search engines requires proper indexing. This process organizes the data and speeds up the search process, making it essential for handling datasets with millions, billions, or even trillions of vectors. Once you index a massive vector dataset, you can direct queries to clusters or subsets of data most likely to contain vectors similar to the input query. However, this approach might sacrifice accuracy to achieve faster queries on big vector data.
To better understand, think of indexing as alphabetically sorting words in a dictionary. When looking up a keyword, you can quickly navigate to a section containing only words with the same initial letter, dramatically accelerating the search for the input word's definition.
Tokopedia's quest for superior search functionality led them to Milvus, a game-changer in semantic search. With Milvus, they unlocked the power of vector representation and built a 10x smarter search system that has dramatically enhanced the user experience. Their search service is also highly available, ensuring seamless operations. This journey with Milvus has transformed Tokopedia's search, promising a future of personalized and meaningful search results. With Milvus, they are revolutionizing e-commerce in Indonesia and beyond.
*This post was written by Rahul Yadav, a Software Engineer at Tokopedia. It is edited and reposted here with permission. *