How Solvely.ai Scales AI Learning Worldwide with Lightning-Fast Vector Search on Zilliz Cloud

70% Lower Latency
Sub-100 ms vector retrieval, even during peak traffic
4–5× Faster Answers
Instant matching to expert-reviewed solutions for a better learner experience
~60% Lower Infra Costs
Cost-efficient scaling to hundreds of millions of questions
Zero Downtime
Stable, reliable performance under global high-concurrency workloads
From a performance standpoint, Zilliz Cloud's retrieval speed far exceeds our existing system. We achieved approximately 70% reduction in retrieval latency, which translates to a 4-5x improvement in overall problem-solving time when we successfully match original questions. Whether measured by speed, cost, or overall value, Zilliz Cloud perfectly met our expectations.
Dr. Nick Yuan
About Solvely
Solvely.ai is an AI-powered learning platform serving nearly 10 million students, educators, and professionals—spanning K–12, higher education, and lifelong learners. Known for its strength in math, business, medical, and life sciences, as well as STEM subjects, Solvely transforms learning materials into instant explanations, personalized practice, and multimodal study guides.
What sets Solvely apart is its hybrid approach: AI models generate intelligent solutions while cross-referencing a vast library of expert-validated content, making it a trusted tool for learners seeking accuracy. But as that question bank grew to hundreds of millions and its user base continued to scale rapidly, delivering fast, reliable answers became a major engineering challenge. That pressure ultimately led the team to adopt Zilliz Cloud vector database as the engine behind their vector retrieval.
With Zilliz Cloud, Solvely now delivers faster responses, lower latency, and a smoother learning experience—helping millions of learners get the support they need, exactly when they need it. As Solvely continues to expand its product offerings and global reach, Zilliz Cloud provides a scalable, cost-efficient vector foundation that keeps the platform performing at its best, bringing Solvely’s vision of accessible, high-quality learning one step closer to reality.
Growing Pains with the Legacy System
One of Solvely’s core functionalities relies on quickly matching student-submitted problems against a curated database of verified, high-quality questions and answers. This approach combines the reliability of a structured question bank with the flexibility and reasoning capabilities of large language models.
To make this possible, Solvely used vector similarity search from the very beginning. Traditional keyword-based and template-based systems could only match text literally, missing similar questions phrased slightly differently or presented in different ways. With vector search, Solvely could embed a student’s math or science question and retrieve conceptually similar problems—supporting both accurate, curated solutions and improved AI reasoning through example-based retrieval. This required two key capabilities from their vector infrastructure: large-scale offline clustering to group millions of questions by concept, and fast, reliable online search to support real-time homework workflows.
In the early stages, existing services handled these needs well. With a smaller dataset and lower traffic volume, query latency and costs were manageable, and the system’s simple API helped the team move quickly. But scale brings complexity. With hundreds of millions of questions in their library and millions of users relying on the platform, performance and cost began to drift away from what the platform required. Latency, which was once a few hundred milliseconds, exceeded one second during peak hours when many students submitted queries simultaneously. These delays had a direct impact on the student experience.
Costs became a problem as well. Maintaining acceptable performance required upgrading to significantly more expensive tiers, and the existing system’s pricing model caused storage and search costs to increase faster than Solvely’s actual usage. Eventually, the team reached a point where the legacy system was no longer sustainable. Solvely needed lower latency, more predictable scaling, and a cost structure suitable for a rapidly growing global education platform. These combined performance and cost pressures pushed them to evaluate alternative vector databases better suited for high-volume, cost-sensitive AI applications.
Why Zilliz Cloud
When Solvely began evaluating alternative vector databases, Zilliz Cloud quickly became a top contender. The team already had extensive experience with Milvus—the widely adopted open-source vector database built by the Zilliz team—during their early development phases. That familiarity gave Solvely confidence in both the technology and the broader ecosystem as they considered moving to a fully managed solution.
Their evaluation focused on three practical criteria:
Retrieval speed under high concurrency
Cost efficiency at scale
Operational simplicity
To get an accurate comparison, Solvely migrated a representative slice of its data into Zilliz Cloud and ran benchmark tests directly against its existing deployment. The results were clear:
Zilliz Cloud delivered 2–3× faster retrieval speeds under identical load.
Latency dropped from 1+ seconds to under 100 ms, even during peak concurrency.
Infrastructure costs fell by roughly 60%, thanks to Zilliz Cloud’s more efficient resource utilization and favorable pricing model.
Operational simplicity proved just as important as raw performance. With their question bank expanding into the hundreds of millions, Solvely needed a service that scaled smoothly without requiring additional engineering overhead. Zilliz Cloud met that need, allowing the team to focus on improving the student learning experience rather than on maintaining backend infrastructure.
“We wanted something that could support us going live quickly, out of the box,” said Dr. Nick Yuan, CTO at Solvely.
Beyond speed and cost, Zilliz Cloud’s feature set offered the flexibility Solvely needed as its platform continued to grow. Partition and cluster management allowed them to organize their massive question database by subject and content type. Auto-scaling—both dynamic scaling based on real-time load and scheduled scaling for predictable traffic spikes—ensured consistently strong performance during peak homework hours.
The Solution: Powering Solvely’s AI Learning System with Zilliz Cloud
Solvely’s system operates as a single, end-to-end Retrieval-Augmented Generation (RAG) pipeline optimized for educational problem-solving. At a high level, the pipeline consists of two tightly connected phases:
Preparing a large, high-quality question bank in advance
Performing low-latency semantic retrieval in real time when students submit questions.
Zilliz Cloud serves as the vector retrieval layer throughout the pipeline, supporting both large-scale offline indexing and high-concurrency online search.
Preparing the Question Bank
Before any live queries are served, Solvely processes and organizes hundreds of millions of questions from multiple sources, including student-uploaded homework photos and expert-curated datasets. Because these inputs vary widely in structure and quality, they must be normalized and enriched before they can be searched reliably at scale.
Content ingestion: Homework images and manually authored questions are entered into the system in different formats. Solvely cleans, deduplicates, and standardizes this content so it can be processed uniformly and indexed consistently in Zilliz Cloud.
Subject-aware normalization: Each question is processed within its academic domain to preserve subject-specific structure and meaning, rather than flattening it into generic text. For example:
Chemistry: molecular formulas, element symbols, and reactions are kept intact
Geometry: spatial relationships and diagram-related information are preserved
Humanities: narrative flow and contextual meaning are maintained
Embedding generation: Solvely generates embeddings for the entire question corpus using Google or OpenAI models. These vectors are stored and indexed in Zilliz Cloud in advance, forming the foundation for low-latency semantic retrieval at query time.
Direct integration with Zilliz Cloud: The generated vectors and metadata are written directly into Zilliz Cloud. By keeping the pipeline lightweight and avoiding complex orchestration tools, Solvely maintains better performance control and can fine-tune the system for different subjects.
Real-Time Retrieval During Student Workflows
When a student submits a homework question, the same prepared infrastructure is activated in real time. This online workflow must be fast, reliable, and capable of handling complex academic input under high concurrency.
Preprocessing the question:
If the question is submitted as an image, OCR first extracts the text. The system then identifies formulas, symbols, and diagram-related cues and converts the input into a clean, standardized representation suitable for embedding.
Vector search with Zilliz Cloud:
The processed question is converted into a 1000+ dimensional vector using Google or OpenAI embedding models and sent to Zilliz Cloud for similarity retrieval. This process allows the system to search by meaning rather than exact wording.
Solvely then performs two types of complementary retrievals:
Background knowledge search: Pulls in subject-specific background information, such as chemical constants, math identities, or relevant reference material. This grounding helps the LLM reason more accurately and reduces unsupported or hallucinated answers.
Similar question search:: Finds previously solved, human-reviewed questions from Solvely’s database. These candidates are reranked by an LLM to capture subtle similarities that vector search alone may miss, ensuring the most relevant examples are used.
Subject-aware use of retrieved content:
Solvely applies different rules depending on the subject. For math and science, retrieved examples help the AI understand the solution method without copying exact numbers or answers. For humanities, retrieved material provides background and context to support explanation and interpretation rather than giving a fixed answer.
Query reformulation for higher answer quality:
Finally, the system may rephrase the original question to capture its broader intent—for example, focusing on the underlying concept rather than the exact wording. This helps retrieve useful context that may not be a direct text match but is essential for solving the problem correctly.
The Migration Process Was Surprisingly Smooth
One of Solvely’s biggest concerns about switching databases was the migration itself. They had hundreds of millions of questions stored in the existing system—how long would it take to move all that data? Does it require writing complex migration scripts? Would there be downtime affecting users?
In practice, the migration was remarkably smooth. Zilliz Cloud provided built-in migration tools that connect directly to their previous system. The process was essentially one-click—configure the connection, specify what to migrate, and let the pipeline run. The tools handled the heavy lifting of vector transfer, metadata management, and structural preservation. The team didn't need to write any custom code or orchestrate a complex data pipeline.
Results & Impact
After migrating to Zilliz Cloud, Solvely observed measurable improvements across multiple dimensions:
70% lower latency: Retrieval-stage latency decreased by approximately 70% compared to the previous deployment. During peak traffic, queries that previously took over a second now complete in tens to low hundreds of milliseconds.
~60% lower infrastructure cost: Monthly infrastructure costs for vector search dropped by roughly 60% immediately after migration, while handling equivalent or greater query volumes.
Better search accuracy: For subjects where LLMs traditionally struggle, such as chemistry, geometry, and calculus, the RAG-based approach significantly improved solution accuracy. Given that the baseline model’s performance is already strong, this incremental gain is significant.
Zero downtime, high availability: Since migrating, they've experienced zero downtime and minimal performance issues. The system handles varying load conditions smoothly. When they do encounter questions or want to optimize something, they get fast responses from the support team led by technical experts who well-understand their use case.
Beyond performance improvements, Solvely’s engineering team also saw clear operational benefits. Zilliz Cloud’s documentation and examples made it easy to get started, and the support team responded quickly when issues came up. Features like automatic and scheduled scaling reduced the day-to-day work of managing infrastructure, so the team could focus more on building the product.
Lessons Learned
Solvely’s experience highlights a few practical takeaways for teams building similar AI-powered retrieval systems:
Similar questions matter as much as exact matches. The team initially expected to rely heavily on exact question matches. In practice, similar questions with minor variations (such as changed numerical values) proved equally valuable. Providing these as context to the LLM improved answer quality even when no exact match existed.
Rewriting queries helps find more relevant results. Instead of embedding the user’s original question as-is, rewriting it to better match how data is stored in the vector database led to better search results.
Reranking results after retrieval improves accuracy. Using an LLM to evaluate and rank retrieved candidates before final response generation helped surface the most relevant matches, particularly for questions involving visual elements like diagrams or formulas.
Text-based retrieval still works well. While multimodal embedding is an active area of interest, the team found that OCR followed by text embedding delivered more reliable results than current image embedding approaches for their educational use case.
Managed services accelerate iteration. Choosing a fully managed vector database allowed the team to focus engineering effort on their core product rather than infrastructure operations.
Looking ahead, Solvely plans to test hybrid search, which combines semantic search with keyword search—especially helpful for course materials where exact terms matter. They’re also keeping an eye on improvements in multimodal embedding, which may eventually allow direct image-to-image search for subjects with many diagrams.
Conclusion
When Solvely.ai set out to democratize education through artificial intelligence in 2023, they knew that technical infrastructure would be critical to their mission. What they didn't anticipate was how quickly they would outgrow their initial vector database solution. As their question database exploded to hundreds of millions of entries and their user base scaled to 10 million students worldwide, query latency became a bottleneck that threatened the very user experience they were trying to perfect.
The migration to Zilliz Cloud transformed their technical foundation. Query latency dropped by 70%, infrastructure costs fell by 60%, and most importantly, students could get help with their homework 4 to 5 times faster when the system matched questions from their curated database. But beyond the numbers, Zilliz Cloud gave Solvely something more valuable: the freedom to focus on building innovative educational products rather than wrestling with database operations.
As Solvely.ai continues to expand its product offerings and user base, Zilliz Cloud provides the scalable, cost-effective vector database foundation needed to serve millions of students worldwide, bringing the vision of educational equality closer to reality.
The migration was incredibly smooth. Using the built-in tools, we were able to import our data from Pinecone with essentially one click. The technical support has also been excellent — our questions get resolved almost instantly, and the documentation, demos, and examples are thorough and easy to work with.
Technical Team