How UNIwise Built a Scalable Plagiarism Detection Platform with Milvus

Cost-efficient
at any scale
10,000+ Documents
Seamless processing in a single batch, with a path to tens of billions of vectors
Smarter plagiarism detection
across European languages with semantic similarity search
Faster innovation
with more engineering time for building new features
Milvus transformed our ability to detect semantic plagiarism at scale. We can now process variable workloads ranging from 10 to 10,000+ documents daily while maintaining cost-effectiveness, which would have been impossible with traditional solutions.
Teis Petersen
About UNIwise
UNIwise is a leading European provider of online examination solutions, trusted by universities for more than 12 years. Headquartered in Denmark, the company supports institutions throughout Scandinavia, the UK, and beyond. Its flagship platform, WISEflow, covers the full assessment lifecycle—from exam creation and delivery to grading, feedback, and integration with university Learning Management Systems (LMS).
Building on this foundation, UNIwise launched WISEflow Originality, a semantic plagiarism detection system powered by Milvus. By selecting Milvus over competing vector database solutions, UNIwise created a cost-efficient platform that can scale to billions of documents. With modern architecture and intelligent scaling strategies, WISEflow Originality delivers enterprise-grade performance and reliability, providing universities with a powerful tool to ensure academic integrity.
The Challenge: Scaling Beyond Legacy Plagiarism Detection
As many European universities expanded their use of digital assessments, many began to outgrow legacy plagiarism detection tools. Existing systems, such as Turnitin, relied heavily on traditional text-matching techniques that were expensive to operate and struggled to scale with growing volumes of content. These methods often failed to capture semantic similarities, making it challenging to detect paraphrased content across different languages—a key need for European institutions.
To meet this demand, UNIwise set out to create WISEflow Originality, a platform capable of handling comparisons across billions of documents while keeping costs manageable. The system required semantic understanding beyond simple text matches and had to support multiple European languages, including Danish, Norwegian, Swedish, German, English, and Spanish. At the same time, it needed to integrate seamlessly with WISEflow, deliver results within a 24-hour SLA, and minimize infrastructure overhead.
From a business perspective, UNIwise faced the challenge of competing against established players with significantly larger resources using a small engineering team to build a complex data processing platform. They also needed to navigate EU public tender processes for university contracts while maintaining operational efficiency and cost-effectiveness at enterprise scale.
The Solution: Building a Semantic Detection Engine with Milvus
To bring WISEflow Originality to life, UNIwise soon realized that vector databases could deliver the semantic comparison and scalability they needed at a fraction of the cost of traditional text-matching approaches. They conducted a thorough evaluation across several vector search solutions, including Milvus, Weaviate, Redis Vector Search, and OpenSearch. Each option was measured against weighted criteria, including stability, scalability for large datasets, performance optimization, standards compliance, community and support, and compatibility with existing tools.
Why Milvus Won
Milvus emerged as the strongest fit across multiple dimensions. Documentation quality was one of the deciding factors, as Teis Petersen, the engineering team lead at UNIwise, noted: “When you need to run a vector database and have no experience, you really, really want good documentation. It’s really, really key.” Milvus provided clear, accessible documentation that accelerated onboarding.
Just as importantly, Milvus is purpose-built for vector operations—unlike general-purpose databases with bolt-on vector search features—offering superior scalability and performance. Its large, active open source community and modern cloud-native architecture also gave UNIwise confidence in long-term support and flexible deployment strategies.
Technical Architecture
With Milvus as the core, UNIwise implemented a fully asynchronous data processing pipeline. The system utilizes Milvus, along with a MiniLM multilingual sentence similarity model that employs 384-dimensional vectors. Additional components include YOLO v3 for document layout detection and OCR models for text extraction. The orchestration layer combines Go services for API management and workflow coordination with Python services for machine learning, supported by an MLflow model repository. All components are deployed in a managed cluster on AWS EKS services.
The end-to-end workflow begins with document ingestion from WISEflow, followed by layout detection to remove irrelevant elements such as titles and page numbers. Text is then extracted, segmented, and embedded into vectors using the MiniLM model. Milvus indexes these embeddings and performs similarity search, after which the results are aggregated and presented directly within the WISEflow interface.
How Milvus Helped UNIwise Deliver Results
By selecting Milvus as the search foundation for WISEflow Originality, UNIwise easily addressed the technical challenges it faced. The platform now combines cost efficiency, scalability, and advanced detection capabilities in ways that legacy plagiarism detection tools cannot match.
Keeping costs in check while scaling
Milvus’s cloud-native design gave UNIwise the flexibility to scale resources up and down on demand. By adopting this approach, they are able to keep infrastructure costs sustainable, despite the large amounts of data.
Smarter plagiarism detection with vector search
Unlike legacy systems limited to keyword or string matching, Milvus enables semantic similarity search across multilingual content. Combined with the MiniLM model, this allows UNIwise to detect paraphrased and restructured plagiarism across seven European languages.
Scalability for any workload
The separation of indexing and search in Milvus allowed UNIwise to scale each function independently. This made it possible to handle workloads ranging from a handful of documents to more than 10,000 in a single batch, with a clear path to tens of billions of vectors in the future. Now, the system can grow in line with university needs without requiring major architectural changes.
Operational reliability with lean teams
Milvus provided UNIwise with a reliable backbone, delivering robust error handling. The availability of comprehensive documentation and a large open-source community also eased the learning curve, allowing UNIwise’s small engineering team to maintain and extend the system without excessive overhead.
More time for features that matter
With Milvus handling the heavy lifting of similarity search at scale, UNIwise was able to focus on building features that matter to universities. The open-source ecosystem continues to accelerate development, ensuring that WISEflow Originality remains competitive against legacy providers while evolving to meet new academic requirements.
Future Plans and Roadmap
UNIwise continues to build on the foundation established with Milvus. In the near term, the team plans to upgrade to Milvus 2.6 to leverage tiered storage for even greater cost optimization and to benefit from the latest performance enhancements.
Together, these plans reflect UNIwise’s commitment to continuous improvement: reducing costs, improving performance, and ensuring compliance, all while leveraging Milvus as the scalable core of their originality detection platform.
Conclusion
UNIwise’s journey with WISEflow Originality demonstrates how a focused team can challenge industry giants by pairing domain expertise with the right technology foundation. By adopting Milvus, UNIwise created a plagiarism detection platform that is cost-efficient, multilingual, and scalable to billions of documents—capabilities that traditional keyword-based systems struggled to deliver.
This success highlights the increasing importance of vector databases in educational technology. Milvus gave UNIwise the ability to handle massive workloads, adapt quickly to new requirements, and invest engineering resources in features that matter most to universities.
Looking ahead, UNIwise is positioned to continue shaping the future of digital assessment in Europe. With Milvus as a strategic backbone, the company can expand its originality detection capabilities while exploring new opportunities in semantic search and AI-driven learning tools.
If I were to choose again, I would still choose Milvus at this point. The scalability, documentation quality, and continuous innovation make it the right foundation for our plagiarism detection platform.
Teis Petersen