BentoML

Choose models hosted on BentoML to generate vector embedding that you can store and retrieve from Zilliz Cloud

Use this integration for Free

What is BentoML
BentoML is an open-source AI Inference platform for serving and deploying machine learning models. It's designed to bridge the gap between data science and DevOps, making it easier to deploy machine learning models into production environments.

Key Features Model Packaging: BentoML allows you to package machine learning models, their dependencies, and inference logic into standardized units called "Bentos." Serving: It provides a high-performance API server for serving your models, supporting various protocols, including HTTP, gRPC, and CLI. Deployment: BentoML offers tools for deploying models to different environments, including Docker containers, Kubernetes, and cloud platforms. Adaptability: It supports multiple ML frameworks such as sci-kit-learn, PyTorch, TensorFlow, and more. Scalability: BentoML is designed to handle high-throughput model serving scenarios and can scale to meet demand. Monitoring: It includes features for monitoring model performance and system health in production.

Use Cases Simplifying the transition from model development to production deployment Standardizing model serving across different ML frameworks Enabling easy integration of ML models into existing software systems Facilitating model versioning and A/B testing in production environments

BentoML is particularly useful for data scientists and ML engineers who want to streamline their model deployment process and ensure consistency between development and production environments.
How BentoML works with Zilliz Cloud
BentoML has a managed service called BentoCloud. It provides a variety of state-of-the-art open-source AI models, such as Llama 3, Stable Diffusion, CLIP, and Sentence Transformers. These models are pre-built and can be deployed with a single click on the inference platform. You can use BentoCloud to help find a model to convert your unstructured data into vector embedding that you can then store and retrieve in Zilliz Cloud. Alternatively, you and self-host the same embedding service with the community version.
Why Use BentoML with Zilliz Cloud?
- Ease of Use: Pre-built models like Llama 3, Stable Diffusion, CLIP, and Sentence Transformers can be deployed with a single click, significantly reducing the complexity and time involved in model deployment.
- Access to State-of-the-Art Models: Users get immediate access to cutting-edge AI models without the need to train or fine-tune them from scratch.
- Reduced Infrastructure Management: The managed service aspect means less time spent on infrastructure setup and maintenance, allowing teams to focus more on their core AI applications.
- Flexibility: The option to self-host the same embedding service with the community version provides flexibility for organizations with specific hosting requirements or constraints.
- Integration with Vector Databases: The ability to easily convert unstructured data into vector embeddings that can be stored in Zilliz Cloud streamlines the process of creating searchable vector databases.
- Standardization: Using a platform like BentoCloud can help standardize model deployment processes across an organization.
Learn
The best way to start is with a hands-on tutorial. This tutorial will walk you through how to build a retrieval Augmented Generation solution with BentoML and Zilliz Cloud

And here are a few more resources
- Blog | Infrastructure Challenges in Scaling RAG with Custom AI Models
- Video | RAG as a service with BentoML
- Blog | RAG Without OpenAI: BentoML, OctoAI and Milvus

BentoML

What is BentoML

How BentoML works with Zilliz Cloud

Why Use BentoML with Zilliz Cloud?

Learn

Related Resources

WhyHow

Model Providers: Open Source vs. Closed-Source

How to Choose the Right Milvus Deployment Mode for Your AI Applications

AI Assistant