How MiniMax Scales Real-Time AI and Trillion-Scale Deduplication with Zilliz Cloud

30ms latency at 5,000+ QPS
for real-time recommendations
3–5× cost reduction
in training data deduplication workflows
2× faster in LLM data preprocessing
compared to legacy MapReduce systems
Petabyte-scale data deduplication
using native MinHash + LSH engine
About MiniMax
MiniMax is one of the leading providers of large language models, known for building multimodal AI systems and real-world applications at a global scale. Its consumer product, Talkie, is a conversational AI platform where users can create and interact with virtual agents. With tens of millions of monthly active users, Talkie has become one of the most widely adopted AI companion platforms in the world.
Behind the scenes, MiniMax also invests heavily in large model training and infrastructure. As the company scaled, so did the complexity of its data, from supporting high-concurrency, low-latency user experiences to managing petabytes of unstructured training data. MiniMax leverages Zilliz Cloud to address these challenges with a data infrastructure capable of scaling efficiently while supporting both performance and flexibility.
The Challenge: When Success Creates Impossible Infrastructure Demands
MiniMax's growth exposed a critical problem in AI infrastructure: traditional databases and data processing systems simply weren't built for the unique demands of modern AI applications.
Redis Couldn't Handle AI-Scale Vector Search
Talkie's explosive user growth created performance requirements that pushed traditional caching solutions beyond their limits. With tens of millions of monthly active users expecting instant, personalized recommendations, the platform needed to perform real-time semantic similarity matching on millions of pieces of content, such as voice packs, interactive messages, and conversation starters.
The system had to respond in under 30 milliseconds, even during peaks of 5,000+ queries per second. Their Redis-based solution, which had worked adequately for thousands of users, failed to deliver at scale. Redis's in-memory architecture made storing millions of vectors very expensive, while its lack of native vector operations forced the team to rely on external plugins that introduced additional latency and operational complexity.
Trillion-Token Deduplication Was Economically Impossible
Meanwhile, MiniMax's LLM data training pipeline faced an entirely different scaling crisis. Processing training datasets containing tens of trillions of tokens required sophisticated deduplication to ensure model quality—redundant content causes overfitting and poor generalization. But at this scale, traditional deduplication methods became economically and computationally impractical.
MapReduce-based approaches took weeks or months to process single datasets, consuming enormous engineering resources and delaying model training cycles. Exact matching couldn't handle the computational load, while semantic deduplication created processing overhead that made trillion-scale operations prohibitively expensive. As datasets grew toward petabyte scale, the preprocessing bottleneck was threatening to make advanced model training economically unfeasible.
The Solution: Purpose-Built AI Infrastructure That Handles Both Extremes
MiniMax requires infrastructure specifically designed for AI workloads from the ground up, rather than general-purpose systems retrofitted with AI capabilities. Zilliz Cloud provided exactly those capabilities: a unified platform capable of delivering both microsecond-level vector search performance and trillion-scale batch processing efficiency, eliminating the operational complexity of managing separate systems for different AI workload types.
Architecting for 5,000+ QPS: Native Vector Operations Replace Redis Workarounds
To support Talkie’s recommendation system at scale, MiniMax completely re-architected its vector search infrastructure around Zilliz Cloud's AI-native capabilities. The new system deployed eight compute units with seven replicas, providing both horizontal scalability and bulletproof reliability during massive concurrent traffic.
Unlike Redis, which required external plugins and workarounds for vector operations, Zilliz Cloud provided native vector indexing and approximate nearest neighbor (ANN) search designed specifically for AI applications. MiniMax's existing 32-dimensional embeddings are plugged directly into the system without preprocessing or external tooling. The entire recommendation pipeline—from embedding ingestion through index construction to real-time similarity search—operated through unified APIs optimized for AI workloads.
This wasn't simply a database migration; it was a fundamental shift toward infrastructure purpose-built for AI-scale operations. Query latency was no longer constrained by memory limitations or plugin overhead—everything operated natively within a system designed for the speed and scale requirements of modern AI applications.
Advanced MinHash + LSH Engine Purpose-Built for Trillion-Scale Workloads
To address the scale and complexity of its training data pipeline, MiniMax worked closely with the Zilliz engineering team to implement a custom deduplication engine—natively embedded within Zilliz Cloud. The solution combined MinHash and Locality-Sensitive Hashing (LSH), allowing MiniMax to efficiently detect and eliminate redundant content across terabyte- and petabyte-scale datasets.
MinHash was used to compress each document into a compact signature, making it feasible to compare billions of documents without overwhelming computing resources. LSH dramatically reduced the search space by clustering similar content, enabling fast identification of near-duplicates without requiring expensive full-pair comparisons.
Rather than building a separate deduplication service, the MinHash + LSH engine operated natively within Zilliz Cloud's indexing system, using the same APIs for embedding insertion, index construction, and approximate queries. This eliminated the complexity of managing separate workflows while providing distributed horizontal scaling that could grow alongside MiniMax's expanding datasets.
Results: Faster Performance, Lower Costs, and Simpler Operations
The unified infrastructure approach delivered measurable improvements across both of MiniMax's mission-critical workloads.
Real-Time Recommendations for Talkie: <30ms Latency at Peak Scale
After moving off Redis, Talkie’s recommendation engine consistently hit its latency target—under 30 milliseconds, even during traffic surges above 5,000 queries per second. The vector-native architecture provided more accurate semantic matching out of the box, improving recommendation quality and ultimately driving higher user engagement.
The multi-replica setup eliminated the availability and stability issues they’d struggled with before. As Talkie scaled to tens of millions of users, the system stayed stable without performance drop-offs—critical for user retention and product growth.
By removing Redis’s costly in-memory requirements, MiniMax also saw a significant drop in infrastructure spend. Zilliz’s compute-based model gave the team more control, allowing them to scale resources up or down as needed—something that wasn’t possible with Redis’s fixed memory overhead.
Data Deduplication: 2× Faster, 3–5× More Efficient
The custom MinHash + LSH implementation transformed MiniMax's approach to training data management. Compared to their previous MapReduce systems, processing speed improved by 2x while costs dropped by 3-5x, making billion-document deduplication economically viable for routine operations.
More importantly, the solution improved training data quality by efficiently eliminating redundant content that previously caused model overfitting. Better data quality translates directly to improved model performance and generalization capabilities—the ultimate measure of success for an AI research organization.
The unified API approach streamlined operations significantly. With deduplication fully integrated into the same system handling embeddings and similarity search, MiniMax eliminated separate tooling, reduced pipeline complexity, and gained operational simplicity that scales alongside their growing datasets.
The team has since applied the MinHash + LSH capabilities to additional preprocessing workflows beyond the original deduplication use case, maximizing return on their infrastructure investment while supporting new AI research initiatives.
Looking Forward: Scaling AI with Confidence
With Zilliz Cloud in place, MiniMax is now expanding its vector infrastructure to support new AI products beyond Talkie. The team is building out multimodal capabilities, reusing the same vector-native foundation to support image, audio, and text embeddings across use cases.
The MinHash + LSH engine is being extended to additional data pipelines, enabling faster iteration on model training and dataset refinement. As MiniMax continues to grow, Zilliz Cloud gives them the flexibility to scale without re-architecting, positioning them to adopt future Zilliz features with minimal overhead.