- Events
Building Resilient AI Infrastructure: Deep Dive Zilliz Cloud's New Production-Ready Features
Webinar
Building Resilient AI Infrastructure: Deep Dive Zilliz Cloud's New Production-Ready Features
Join the Webinar
Loading...
About this webinar
As AI applications move from experimentation to production, developers and technical leaders face new challenges in managing vector databases at scale. This webinar introduces Zilliz Cloud's latest features designed to address these challenges head-on, enabling you to build and maintain robust, efficient AI infrastructure. Join us to explore how these new capabilities can solve common problems in AI infrastructure management.
Topics covered
- Implement seamless data migration and recovery strategies, ensuring data portability and eliminating vendor lock-in
- Streamline integration of unstructured data from diverse sources, enhancing the breadth and depth of your AI models
- Optimize query performance and system availability for high-traffic AI applications
- Automate resource allocation to handle variable workloads efficiently
- Achieve enterprise-grade reliability and monitoring for mission-critical AI systems
Through technical demonstrations and real-world use cases, you'll gain practical insights into leveraging these features to overcome bottlenecks, enhance system resilience, and accelerate your AI initiatives.
So today I'm pleased to introduce this session, uh,September, 2024, Zell Cloud Product Launch,building Resilient Data Infrastructure for ai. And it's gonna be meand John, your old friend, uh, who's, who is the head of,uh, Gabrielle at Ecosystem with zi, uh, to give,give you guys this talk. Uh, John, do you wanna say hi to everyone?Hi, thanks, uh, Stephanie for the introduction. Hello, everyone. Okay.
Um, so let's talk aboutwhat we released in last month. We're actually very excited about this, uh, since, uh,OpenAI released their product like aboutmaybe 18 months ago. Lot of, uh, developers really tried to back your databaseand they just try to use it for recommended system. Like lots of things, we prototyped their products, um,and gradually they were like, okay,we are in production right now. We really need your helpto make sure everything runs smoothly.
So this is all, this launch is about. So it is just one. We just wanna give you more control of your data,help you attacking those big complex AI apps,and make sure everything runs smoothlyand securely where you are in production. So in this launch, we categorize thefeature into three bucket. The first one is data sovereignty.
So basically we provide this kindof a grade migration services between different, uh,vector database system,and then we have the five train sourceconductor, uh, with ga. Uh, the really good thing, many of them, many of youwho know, uh, five train,the really good thing about them is likethey have connection. They already built a portfolio, uh, of more than 500 system. So with that, actually it's easier for you to retrieve data,like unstructured data from every systemfive train connection was. And the second bucket is high performance.
Uh, it's really about we want to provide higher throughputthrough replikaand with some more, uh, better for tolerance. Also, we want to help you to remove some operational burdensthrough like a upscale, we'll talk a little bit more later. The last piece is about the security and a reliability. Uh, we actually work on the metricsand alert for the past few monthsand release, like lots of new metricsand alerts we're continu to working on it. Uh, so we'll talk a little bit more about that later.
And also, um, about all the very important,the feature we released, uh,last month was worth about this 99. 95 uptime,SAA SOAs. It's like, uh, probably better than any other, uh,major vector database provided at this moment. Just really want to make sure, um, you, your, uh, is always,uh, to minimize the downtime. So, uh, that's what we're doing for this.
So, uh, let's just like jump into a little bit moredetails of the feature. Um, the first thing, as I mentioned,is like a migration service. We all know it is really critical to ensure seamlessand reliable data migration. Um, for Vector, it's actually something new. Um, very few people have ever tried itor company have accomplished it,but we all know it's very important to help youto avoid a vendor, vendor locking, helping you to dealwith data backup and recovery, et cetera.
So that's why we, uh, released the first phaseof migration service is really target on zero loss onstructured data migration. Um, for now we are, uh, supporting batch import,but later this month we're actually gonna, uh,release the real time data streaming as well. Um, also we aim at like,simplify this kind of transformation. So we are providing lotsof tools like a in embedded service taggingto make it even easier. And the last piece is, as I mentioned,is like a quality is a really big issuewhen you migrate data.
So we want to ensure the end-to-end data quality. Uh, we're, we're gonna do thatthrough very robust monitoring and alerting. So you, you don't have to worry if anythinggoes right in the process. So let's look at it, how exactly it works. John, do you want to, uh,just like share a little more detailshow exactly migration service work?Of course.
Um, so the migration service is really, um,a piece of infrastructure that we have developed sothat you don't need to worry about the, the reliability,you know, data loss in during the migration process,which was a lot of, you know, uh, was a lot of, uh, um,engineering work and, uh, a lot of development that hasto be dealt with, uh, by the developers. So think about, you know, during the migration,what if there is, uh, just one, um, one request, um, comingthrough, like network error, and then the request is lost. You have to do retry. You have to make sure that, um, you, you, um, uh,have migrated all of the data points, um, so thatthis migration serviceand the infrastructurebehind it will just automatically solve all the problemsfor you and will connect to, uh, a, a, uh,uh, a rich list of data sets, uh, from, you know,unstructured data source. Like, um, right now we have supported Elasticand also, uh, vector storage, uh, sources like PG Vectorand also bu, um, and, uh,and in the future, releases will support other, you know,vector data sources and unstructured data sources as well.
And, uh, what's powering this is actually,um, open sourced. It is called VTS. Uh, you can check the open source repo on, uh,Zi Tech on GitHub. So, um, the internal looks like, um, it has, you know,source and sync connector so that it can connectto any source and anything, um, as longas you have the adapter. And then in between, that's the transformer,that's the transform stage, um, where it will, uh,either do some data cleaning, like say, uh,I just select this particular fields, uh,from the source dataor in the future, it will also support, um, uh, interactingwith the embedded model services so that it can do vectorvectorization, uh, in between, um,and to guarantee the reliabilityand correctness of the migration.
Um, there are some other components, um, such as checkpointmodules, monitoring modules, so that it can make surethat the, the whole process, uh, works smoothly. And the, um, service that we're launching, uh, right now,um, is available in Zeus Cloud. And you can migrate your, say, local VUS instanceor the vectors from your PG vector instanceand elastic search, um, to Zeus for, you know,a more performant and more scalable vector search service. All right, I think we can dive into a bit of demo this. Yep.
I already shot it, like, the next step is demo. Okay. Okay. I'll let you share your screen. Okay.
I'll share my screen here. So, as you can see here, uh, this is my, um,my zills cloud account. Once I log in, I can select a particular, um, you know,cluster and then check out the,the collections in, uh, in those cluster. And here on the left navigation bar,you can click into migrations to migrate the data, um,you know, from or to a zills cloud cluster. So, for example, say if I have aexisting Zills Cloud clusterand want, I want to migrate to a new one, uh, for, you know,I, um, say if I want to, um, bump up from a serverless, uh,cluster to a dedicated cluster,or I want to, um, duplicate my dataand for other purposes, you can, uh, click here.
Um, I'll, I'll just show this example, like, um,migration from the same cluster. And so, um, within this you can select, um,a source cluster to migrate for, um, for example,I select this one,and then I want to, uh, you can migrate itto a existing cluster, which means you can select,I I can just show a bit here, uh, which say if I,I have two collections here in this cluster,and say I just want to, um,migrate one particular collection,and then I'm, I can migrate it to another product. Um, another cluster, uh,I'll just say like migration test,and then you can do migration. Um, the other, the other feature is you canmigrate to a new cluster. And as part of this process, it will actually, um,guide you through the cluster creation process.
Say, um, I have an existing cluster in serverless,and, um, that was, um, cost that, that was, um, um,cost efficient, but I do want some more performance. I want to migrate to a dedicated cluster. You don't have to do this yourself. You can just, you know, get hereand then select, um, a projectand then name the cluster, say,this is my new cluster. And then you can select, where do you wantto deploy this cluster, like AWSor GCP, um, here I'll just keep the default choice.
And then you can select the, uh,the settings of this cluster. Do you want performance optimized or capacity optimized?Um, they're of the same cu price,but for each cu they can hold different amount of, uh,vectors 'cause they have different, you know, performanceand cost implications. So let's say I, I do want performanceand I'm selecting performance optimized cluster type. I'm storing less vectors in one cubut this gives me, um, more performance. So after setting everything,I can just click on the migration button,and here the new cluster isbeing created for me.
It can download the user, uh, uh, usernameand password, uh, which is just another way to authenticate. Um, usually what we do is, uh, we use the API keyas the token for authentication. And then you can see the public endpoint here. So, um, now there's no collectionbecause the migration process has just started,but you can check the progress in, um, the, the jobs, uh,section, right, right below migrations. So here you can see I have done a migrationbefore successfully, uh, to migrate datato a new cluster called migration test.
Um, and here the new, um, the new migration jobthat I have just created is in progress. And once it is set up, uh, it set up the, the collections,it will also show you a progress, like how, like 10%of the data has been migrated, things like that. And in addition to migratingbetween the Sales Cloud cluster, you can also migrate from,uh, your local, like self-hosted VUS instance. Say you have a, a mill instance deployed on your, uh, ekisor, uh, self maintained Kubernetes cluster. You can migrate this by, uh, dumping a backup file.
Or if you have more data,or you don't want to kindof have the temporary file storing all of the data,you can specify the endpoint of your local mills cluster. Um, you just need to specify the network endpointand username and password,and it will, um, just automatically migrate this for you. Everything else will look similarto the, uh, previous process. Uh, you, you'll also have a chance to, uh, select the sourceand, uh, sorry, to, to configure the, the schema,which means, uh, you can do some, um, se selectionof the fields, uh, during the migration process, sothat if you don't want some, some part of the data,you don't want some field, you can just drop it. Uh, same applies to Elastic Search and PG Vector.
Um, it's also the same ideaand just need to specify your elastic search, uh, endpointand the API key or username and password. Okay. I think that's, um, that pretty much,that's it. Oh, uh, let's,Let's keep going. Yeah.
Okay. Let me go back to share my screen. Um, sorry, one second. Oh, I cannot find that screen anymore. Oh, here.
Sorry about that. Um, let's talk about next. Um, so Fly Train Connector, uh, as I mentioned earlier,is like a really exciting connection, like a partnerwith the Five Train team for us,because they have already done the really hard jobto connect more than 500 system. Um, probably like lots of the company use, like manyof them like Snowflake, Salesforce, uh, gong Asana. So they probably already stored like lotsof unstructured data over there through Five Chain.
We can, um, extract unstructured dataand use the open AI embedding, so embedding servicesto do the vectorization,then we can just use the Ware sourceor destination connectors to connectwith the V Cloud or vis. Uh, John, do you wanna talk a little bit more about thisprocess and maybe show a demo to everyone?Of course. Um, so the,the Five Trend team is really doing a great job on makingpretty much every business appand data source, uh, connected through this platform. And also it just like the migration services solves theproblem of, um, reliabilityand correctness sothat it has the automatically implements the retry logic. It also has, um, you know, all sortsof infrastructure support, um, to make surethat the data can be synced, um, in real timeand reliably, um, without losing any, you know,any, any data records.
Um, so some of the examples that you can, um, unlockwith the five try and,and integration is you can make pretty much everysingle business data source searchablethrough this connection. So, um, let's say, um,if I have some data in Salesforce or Zendesk or like GitHuband Slack messages, um, it wont take a lot of effortto build a data pipeline to make them searchable, uh, if notwith the Five Trend ZI integration. Um, however, with this, it will just be as simple asyou have the data source, you set itas a source connector in Fiver,and then you, um, fiver has the pre implementedbusiness logic to, um, say,join the worst tables on the data source, say on GitHub, um,has the issue table, it has the, the user profile table,it has the comment table, like a lot of tables,but usually for search, you wantto flatten them into one single table, um,with pretty much error information. So that's easier for, for you to search on the, um,on the contaminated text,and then maybe filter on some, uh,project case on the metadata. So Tran has this, um, this functionalityof joining those tablesand then sync all of the new data recordsand existing data records from the source tablesto a, a data warehouse.
Say we can select, uh, snowflake as the data warehouse,and then you have the, um, the flattened, uh,normalized table in the, in your Snowflake data warehouse. And then you can set Snowflake as the data sourceand VU as the destination. And then, uh, in between, uh, like during the, uh,during the, the, uh, the data movement process, um,the five ton connector will call the invitingservice right now. Uh, it supports open AI inviting service,and this does the vectorization in a streamlined function. And so that, um,after this process, you have the vectors, all of the,you know, text and labels say the sender of the message,the, uh, the timestamp of the MA message, everythingported into Zeus.
And this works in a streaming fashion, meaning that allof the new data records showing up in GitHub,say someone creating a a new issue,fiver will automatically detect thatand then sync it to Sales Cloud or bu. So do you see this in action? Um, yes. Let me, let's look at a, look at a demo. Yeah. Okay.
So here is my five trend account. Uh, so in the destination, you can create a,a new destination of VU type. Uh, right now it's, um, preview in, in the public preview,and this is, uh, partner built, which means, uh, we workedwith Tran together to implement this, um,this connector logics,and you can select this, you can type some, myvu can type a name for it. And then, um, in this, in this page, uh, it'll let youto specify the, you know, credentials of your VU account,uh, or of your Mil instance,or is this called instance network is just a bit, um,I can show some, yeah. Let's see.
Uh, if I want to migrate from, um,say this test account, um, you can just, you know,copy the endpoint and token and o of course,'cause it, it will use, uh, OpenAI. You better have your OpenAI key, uh, you know,uh, ready somewhere. So that, um, here I will just show some, uh,quick example here. So, you know, you can, you can just specify your,your endpoint and, um, token here. And also you need to specify open I key.
And moreover, uh, you needto specify the open in embedding modelthat you wish to, to use. Um, this is importantbecause, um, this in embedding model will be used for,um, ingesting data to bu, which is the,which shall be the same in embedding model that you will useto embed your queries. So say if a user asks a question, you wantto embed it into a vectorand then search through the existing vectors in the, uh,in the vu, uh, collection. And that embed model needs to be the exact same one. Otherwise, just, they just,they just don't match each other.
And there, there, there isn't a way to, to search, right?So, um, you specify thisand you specify the data processing locationand the provider with, uh, cloud provider, which, uh,where the, this connector will be run. Usually I would just keep the default choice,and then you can save and test, um,and, uh, you know, you can, you can save this,and that becomes a, a destination. And for the connector part, uh, you can add connectors. So usually you want to, um, do first is you want to, uh,uh, establish the collection connection between the GitHubor any, you know, data source, say Zendeskand Snowflake, uh, sorry, Zendesk and,and, uh, uh, Salesforce to the data warehouse. So that you, you, you need to, um,create another destination of Snowflake type.
And then you specify your Snowflake credentials. Um, and then in, in your Snowflake account, you need to, uh,you know, um, set a, just follow the instructionand set a account for five trying to use. But after that, you can port your, uh,GitHub data to Snowflake. So here, I will just, uh, do a partial demo of this. Um, so you canselect a Snowflake, uh, account as a destination,and then you can authenticate your account with the GitHub,OLS or other, you know, authentication matters.
Um, here I will just authorize this and see what happens. Uh, lemme see has GitHub. Okay, so here I authorize my GitHub account. And then you can select, um, you know, you can sync allof the repositories, or you can sync a specific repositorythat you have access to. No, I would just, uh, just do a random demo here.
So with this, it will just start testing the connection. And, um, you know, once all the connection pass,you can pour this data to Snowflake your Snowflake account. And once you have that, uh,there should be data in your Snowflake account already,and then you can create another connectorto connect the Snowflake accountto your Vu uh, destination. So here, you know, I can,I can select this same Snowflake account that I was porting,uh, GitHub data to,and then you can, uh, specify the credentialsof the Snowflake account,and then you can establish the connection to, uh,you can establish connection to the, to the mailbox account. Sorry, I made a little mistake.
So in this step, the destination should be your VU account. So here, um, it's like my mailbox, uh, destinationdeployed on Zs cloud,and then you specify the credentialsof your Snowflake account sothat it can establish the connectionbetween a Snowflake table to a VU connection. Okay. So what you can do with this, um, so here's a,a demo that I created. Um, let's, let's look at the data first.
So here's, um, uh, here's a collection that I createdbefore syncing the data from, uh, GitHub repo, uh,called Toki to, you know, snowflake. And then from Snowflake to, uh, VU, uh, collection. So here in this collection, um,I actually ported a lot of data. So the schema looks like it has the original text,which is concatenated, um, issue, topic, issue,comments, and like all the textualinformation from that issue. And then in addition, it also has some metadata suchas the URL of the issue.
So we can check out this one. Uh, it's just the, the issue from this, uh, to repositorywhere people ask questionsand, uh, you know, there are, there are some answers. So all of the questions and answersand comments are conative togetheras one single piece of text. And that's the original text. It also has other metadata like, uh, creation, timestamp,um, and of course it has the vector, which is, uh,the vector embedding generated from the original text.
So with this, you can actually make a search appof your GitHub issues,or even better, you can, you can do right on top of that. So here's um, uh, example app that we have created. Here we go. Um,I specify, um, you know, this, this just a streamed app. So as specify my ZI host, which is the,uh, network endpoint of the Mills collection and tokenand Open I Key.
'cause you know, for rag it needsto use the large language modeland also, uh, in embedding model for encoding queries. Um, and then you can have, uh, you know,this in embedding model specified to the embedding modelthat you chose for ingesting the data. Uh, in that case, I was using, uh, embedding three. Oops, it's broken. Yeah, let me try this.
Okay, I'll just show this again. Um, let me specify the credentials here. Oops, here we go. So, and to specify theendpoint in my token,in my key,okay, so now it flashes all the existingcollections in this table, uh, sorry, in this cluster. So let me select, select this one,and now I can ask some questions.
Let's see. Um, thissupport report, local pass,sorry, model local pass. Just ask a random question and see what it returns. Okay, so behind the scene is actually it will embed thisquery text into a vectorand then do the vector search on this particular collection. And then with the return data,it will call large language model to do generation.
So in this case, it actually fetched this fieldsources of information. I see what this talks about. Okay, in this case, uh, it's load trying to load a,a local encoder or model. And, uh, well, it looks like it respected the,the search result, but I, I think the,but confirm,yeah, I think this is not the answer I was looking for. Let me ask it another way.
Support localslocally stored model has,okay, looks like, uh, telling me, yes,let's see if there's a reference on that. Okay, looks like there's exactly the ques the, the issuethat talks about this topicand, uh, here, yeah, um, someone is answering, you know,click model does support Check one pass,which can be locally cached. Alright, I think that's, um,that's pretty much the, the demo. So if you want to try this, uh, I will just send the,the link of this demo app, uh, in the, um,you know, in the, in the, in the, uh, zoom chat. And here we have a, uh, a dedicated session, um,demonstrated by Fran, which, you know, walksthrough the whole process of setting up fiver connectorsand numerous destination.
And then, you know, use this demo app to showthat you can search with your, um,data from any business app like GitHubor Zendesk. Okay, that's it. Does anyone have any questions?Um, if you do, you can, um, just likecopy paste in the QS session. So we're, we're happy to answer any questions at any time. Um, okay,then let's keep going.
Uh, another thing I found super excited is,like in Model Replica, uh,we heard this from quite a few customers. They have not very, very big data set,but they require really high throughputbecause they're consumer app, like,let's say they're doing some, like a recommendation engineto recommend a product to their users, like usually like a sof people are online at the same time. Um, so that's why we have this replika, um,currently public preview. And the good news is, um,I think it's gonna go ga either late this monthor early next month. Um, so you just basically can help you handle muchmore queries at the same time.
Whereas for those kind of company who require much higher,um, bot tolerance, um, really cannot dealwith much like a downtime. You can always use it to minimize the downtime. Another really exciting feature is about autoscaling. Um, again, this is a word that I heard from our junior teamor heard from some customers. They are app actually, uh, changed quite a lot of workload,like just keep increasing.
So they're just like really re uh,they're just like really busyand try to, Hey, I want to, um, scale up another class, uh,sorry, I scale up another c another cu uh,then this is like a loss of work for them. So we create this feature on Kathleen Private preview. So it's like whenever it reach certain kind of, um, um,preset rate of the capacity of the cluster,let's say the D 40, 70%,the system will automatically scale another CU for you. And also you don't have to worry about, um, it will goto extent you are not comfortable withbecause there's a maximumresource threshold you can set up over there. Like, for example, the maximum you can deal with is like,let's say 32 cu so you can set up over there.
So it will not go over 32 cuwhenever it reach that kind of threshold level. You can just like review it. Maybe you want to optimize the system, reach out to usor like, okay, we do have like lots of data,we wanna continue to scale it,but the system we're not automatically scale it for you. And that's last piece, uh,we share here is like about metrics and monitoring. Uh, as I mentioned, the team worked really hard in the pastfew months about all the alerts, dashboards, just like, tryto make sure you get the alerts, monitor your performanceand use your existing tool, especiallyto minimize the service on time.
Uh, for the alert itself,right now we support 39 organization organizationaland project alerts. And some of them are pre-configured, um,but many of them, actually a lot moreof them, you can customize them. So whatever kind of alert you wanna receive,you can just set up in the, uh, in the ui. Uh, the next piece is about the metrics. Right now we provide 18 metrics, uh, ranging from, uh,resource usage, QPS request result, and data operations.
And you actually have, like, there are lotsof customizable tool in the ui. You can just do some in-depth analysis, for example,you want to just select certain time period you are,you are allowed to do that just in the ui. Then the last really important pieces about integration,right now we're integrated with New Relic and Grafana,and if you are their users,and you can just connect Zillow Cloud with, uh, new Relicor Grafana, let's do the monitoring through there. And also the team is like a work partly to get the data docand permit permit this integration. I think it's gonna get out, uh,either later this month or early next month.
We'll definitely make sure you guys know. So, uh, if Data Doc can permit thisor you currently monitor too, uh, very soon you're ableto leverage them to monitor those as well. Uh, I see there's a question, uh,Dynatrace, that's a great question. Um, John, I don't know if we have the planto support Dynatrace yet,but, uh, if you do have the needs always connect with our,with your account manager,we'll just like send out like a report,sorry, a support request. We're just gonna, um, put it on our road mapand make sure, um, it's gonna release sometime,uh, in the near future.
Yeah, we, uh, we don't have the, um, planfor Dynatrace at this moment,but, uh, we'd like to consider this as a, as another optionof, you know, monitoring integration that will provide,we'll probably start, uh, we'll, we'll startwith Datadog and Promises. Um, but yes, please talk to your account manageror like send us a a support messageand thanks for your feedback. We'll consider this. Okay, so that's all the feature, uh, the oneto talk about this launch. But there's one more thing.
Uh, I don't know how many of you guys have heard of this,um, but, uh, um, our syllabus actually went ga last month. Um, it's been quite a few. We, I think it's about five months, five months each. We are in the public review. We received like lots of feedback.
Um, so this is definitely on the right track. Originally we have two. So we, for the commercial offer, we mostly provide two typeof clusters, standard and, uh, uh, enterprise dedicated. Um, both of them just like require, um, like you,you, it's more like an infrastructure focus. So they have different kind of use cases.
Like for example, standard is really more for POCand development Enterprise is reallyfor mission critical use cases. You have like lots of configuration of options over there,but we heard from like lots of developers to say, Hey, um,we're really, we're just getting started. We're really not sure where our workload will beor where our workload actually fluctuated a lot. We really just don't have that much resources to just like,um, scale the cluster up, up and downs all the time. And so we hear you.
So we uh, released like a serverlessand in public preview I think in Apriland last month we finally ga. So this is really more for, we would, uh, position it morefor like the serverless applications. Um, if your traffic is, uh, in like, uh, infrequentor you are really not sure this will be great choice for youor you just get started with your applications, um,you want something go a little bitbeyond the free tier time provide, uh, you can startwith serverless and to see this if this islike the right option for you. Um, as John showed earlier, for the migration service,you can easily to migrate to dedicated one day if you want,or you can just stay with serverlessif that meet all your needs. So you can see there's some difference I highlight herefor SLIs, you actually don't have any CU options.
Um, you actually don't have to think about, oh,which compute like infra infrastructure worksbetter for, for my needs. You just like go with the oneor, so you don't have to worry about a scale. I talk about optic scale for the dedicated clusters earlier,and that's to scale up. But for this one, you don't have to think about it. There's no cu kind of concept.
Uh, you just like use how much you wantand never have to worry about, oh, should Ilike scale my infrastructure or not. Then for now there's no SLA,but in the future we're definitely gonna add it. So make sure it can work for the production use cases. Um, but you still get like lots of, uh, securityand con compliance if that's required. We got all the data encryption R backand the major compliance SOC two GPR alreadyand Heap already, uh, or so there's a support behind it.
Uh, it's similar to the dedicated, uh, sorry,to the dedicated standard cluster support tier. Um, so that is our serverless offering. Um, so, uh, if you haven't tried,you probably should just like try it right now. I think they have some free options. You can just like to use it to certain point for freeand just see if this could meet your needs or not.
Um, John, I think you also pre, uh, like, uh,prepared like a small demo for the serverless. Yeah, exactly. Um, let me show my screen here. Okay. Um, yeah, just, um, actually let, let, let,let's show the serverless introductionwebpage here, right?So if you want to learn more about Serverless, you can gothrough this, uh, this.
com serverless,and here it shows the, um, the pricing planand also all of the features that it, uh, enables. So the, the key idea behind Serverless isthat it provides, um,just a vast cost reduction compared to dedicated. Um, and this is good for the workloads where youdon't really have a very high requirement on thesay search latency. 'cause uh, certainly, uh, you know, it used the, the shared,um, resource pooland it provides, um, great cost, uh, reduction,but it is not as performant as the dedicated cluster. And more importantly, it provides less predictabilityof the search latency compared to dedicated 'cause.
For the other case, you know, on on dedicated resource,you have dedicated machines dedicated, um, you know,servers serving those requests. And the serverless is using, uh, a shared pool of resources. If there's, uh, less contention in that resource pool,then it may even perform higher, you know,search latency than dedicated. But for most of the cases, the search latency may vary. So there's just, you know, bear that in mind.
But overall, this is a good option for caseswhere you are just, you know, startingto deploy your generative AI applications, your,your rack chat bot,and you are not sure, um, how much traffic, uh,it will be there or like, um, howhow much the users will like the product. And you want to test it with some, you know,cheaper option than serverless is a good, a great option. And, and also moreover, uh, we provide the free tier of,uh, of surveys. So if you have, you know, less than five gigabytesof storage, you know, including Vectorand all the metadata, um, you can, uh,you can use the free tier, uh,and uh, it will be just, you know, be free. So, um, let's try create a serverless cluster here.
So just, um, you know, in, in this, uh,cluster management topic, you can create a, a new cluster. You can create a free one, um,but on one organization only allows one freecluster to be created. And then you can create a serverless oneand it will charge not on the cu it will charge on theabstract, um, you know, uh, unit, which is called uh, VCU,which is pretty much how many operationsthat you, you conduct. Um, will, will charge, uh, you know,$4 per million VCUs used. And, uh, you also, you can also specify the, you also needto specify the, the region.
And right now we support serverless cluster in the, uh,US west one region of, uh, Google Cloud. And then you can create a cluster here. All right, um, let's see how to use this in action. So here I prepared a short demo of image search. Um, so I want to, you know, build a backendthat powers this kind of image search where I give theone image as a queryand then I want to search for similar images.
Um, we need to go through some setup,but I have done this previously. But basically the, uh, you know,before, uh, you can, you can use the, um, mul slidethat runs locally, like in this,in this notebooks environment where you just needto specify, uh, URI, which is the local file namethat all of the data are persisted to. So let's say you have developed, developed this,this application with the locally run mail slide,and you can conduct of course image search here. Um, and now you want to deploy, deploy this to production,you know, to serve your real users. And now you need a server.
So you can, you can use the, uh,mill server running on the Z cloud serverless cluster. And the way it works is by, you know, insteadof specifying this URI, you specify the URI asendpoint of your serverless cluster and specify the token. So let's just use the, uh,just created serverless cluster here. Let me see if this is ready. Okay, here we go.
Let's copy the endpoint here. Oops. And of course the token. Okay. And just copy my tokenand, uh, endpoint of the new cluster.
There is comment on this one and run this line of code. So here it is creatinga collection called image embeds in this cluster. So let's do this. Okay, it looks like it is successful. Let's check the, let's refresh.
Okay, here we go. We have the image embeds here. Let's do data preview. And as expected, there's no dataand let's ingest the vectors of a bunch of images to this. So what we do is, uh, we have a, you know, a,a folder with all sorts of, uh, imagesand let's ingest all of them there.
So what's this doing is, uh, it just goesthrough each single image in this train, uh, folderand then use the inviting model to generate the vectorand then do insertion, um, by inserting the vectorand the metadata called file nameto the image embedding collection. And here we can see that it should graduallyadd data to this. So let's do data preview again. Now we can see that we have a lot of vectors being ingestedand the file name indicates their file name the whole,so we can also try a vector search with any random vector. So say if I just want to search for vectors adjacentto this vector, let's do a top K search here.
Let me top K two is three. So this, what this do, um, what this does is it will searchfor the top three nearest neighborsof this square vector and search for this. And here we got, uh, three results. And it will also show the score, which is the distancebetween the target vector and this vector. Um, so here, very interestingly, we have a one,we scored 1.
0, which is the highest possible score. Why is that? Because we just searched a,a vector which exists in this collection. So this is exactly the vector, the same vectoras the target one, and they have this, um, so in this, uh,depends on the distance metrics that you use. So in this, I think it's using, let me, let me see. Okay, so it doesn't see, uh, it doesn't say the, uh,the metric, but as I remember,the metric is probably co-sign.
And so that if they are the same vector,then the co-sign distance, it's just one. Okay, I think it should have finished already. Let me see. Let's just go ahead try, uh, searching the vector. So let's, let's do try, uh, searching for this docand see what it returns.
This is going very slow, I think. I didn't use GPU in, well, I used the GPU. Okay,lemme just terminate this'cause we have enough, uh, vectors in the,in the repository already. Let's just do the search. Okay, here we go.
We got some, you know,lovely docs being returned, which are the nearest neighborsof the vector of this dock. Let's try another example here. Um, let's see. I want to search for car mirror. It's uncommon this line and see what happens.
Okay, here we go. Um, with the target query vectorof the car, of the car mirror, we got a, a bunchof other car mirrors, um, being returned. Okay, so this is just a quick, uh, you know, sneak peek intowhat you can do with serverless. If you, uh, want to explore the examples, you can goto mills ioand we have many other demos being listed here. So you can go to the docs page tutorials,and here you can see like what you can do with VUor this cloud, uh, serverlessand dedicated clusters, anything they share the same API,you can build, uh, you know, rag image search.
You can also build multimodel ragand, uh, even graph with knowledge graphand, uh, inviting of knowledge entities in bu. So that's my, that's my quick demo on the, uh,Zeus Cloud service back to step. Great. Um, I think that's all the content we have today. So the next part is, uh, q and aand if you have any questions, please just shoot it out.
So I'm, uh, sharing some of the helpful linksthat I presented before and feel free to check them out. Anyone got questions?Okay, if not, um, yes, I thinkthat will be pretty much it. And thanks everyone for staying with usand feel free to check out the linksand the, uh, newly released features like migration,five channel connectors and serverless cluster. Yeah, also always reach out if you have any questions. Um, we have Discord channel Open always, uh,in vis.
io/discord. Um, just welcome here, join our Discord channel. Uh, yes, this is recorded. Uh, you're gonna receive the recordingin in the couple of days, I would say. Yeah, usually within a day or so.
We'll publish the recordingand you receive emails, uh, you know,with the recording links and, uh, also the, the slides. Uh, thanks John. Thanks everyone being with usfor, for this 15 minutes. Uh, I'll see you sometime soon. Thanks everyone.
See you. Bye-ByeBye.
Meet the Speaker
Join the session for live Q&A with the speaker
Jiang Chen
Head of Ecosystem and Developer Relations
Jiang is currently Head of Ecosystem and Developer Relations at Zilliz. He has years of experience in data infrastructures and cloud security. Before joining Zilliz, he had previously served as a tech lead and product manager at Google, where he led the development of web-scale semantic understanding and search indexing that powers innovative search products such as short video search. He has extensive industry experience handling massive unstructured data and multimedia content retrieval. He has also worked on cloud authorization systems and research on data privacy technologies. Jiang holds a Master's degree in Computer Science from the University of Michigan.