From SF with Love: AI Agents and How to Build Them

You’re in!

Webinar

From SF with Love: AI Agents and How to Build Them

Transcript

Today I am pleased to introduce today's session from SFwith Love AI Agents and how to build them. And our guest speaker Yuin TangYuin is a senior developer advocate at Zillow,and he has a backgroundas a software engineer working in auto on auto ml at mAmazon U Eugene, studied community science,computer science, statistics,and neuroscience with research papers published atconferences including IEEE, big Data. He enjoys drinking bubble tea, spending time with family,and being near water. Welcome you, Eugene. Thanks Sachi.

Hello everybody. Uh, today we're gonna be talking about AI agents. Um, so before we get into this,lemme take another quick survey of the crowd who understoodthe title, who got the reference in the title. If you did, you get extra points, um,drop into the chat if you did. Uh, so I'm gonna get into this.

So my name's Yuin Tang. I am a senior developer advocate here at Zillow,and that QR code there to your right, will lead youto my LinkedIn. If you scan that, you can follow me to kind of keep up more,uh, with everything that's going on with rag AI agents,vector databases, uh, LLMs in general, that kind of stuff. Um, someone asks, where do I redeem these points?You can redeem them, uh, in your mind. Uh, they're just, uh, they're just extra points,so you just know that you are a cultured member of society.

Um, all right, so yeah, that's a little bit about me. And then Zillow is a vector database company. We are the, uh, company behind vis, which is, uh,the world's most popular vector database on GitHubbased on number of stars. So today we're gonna be talking about AI agents and we rag,and specifically we're gonna be talking abouthow you can build a RAG applicationin an agentic way using AI agents with LAMA Index and vis. So the first thing we're gonna do is we're gonnago over a project overview.

We're just gonna cover the technology that we're gonna use. We're gonna cover the, um, the, uh, the waythat we're gonna put this technology together,and then we're gonna cover the different piecesof this project. So first we're gonna cover a little bit about rag,what is rag, uh,and then we're gonna cover a little bit about AI agents. What are AI agents, how do they work?And then we'll touch briefly on LAMA Index,which is gonna be the orchestrator that we're gonna useto create these AI agents. And we're also gonna touch on vis,which is the Vector database that we're gonna use, uh,for our rag that our AI agent is gonna useto do retrieval, augmented generation.

And then at the end, I'm gonna walk you through a demothat's gonna show you how you can build this yourselfand what that's gonna look like. All right. Step one, we're gonna do a project overview. So today we're gonna be talking about, um, how to buildAI agents that do rag. And we're gonna use these three technologies.

So we're gonna use viss. Viss is gonna be our Vector database. Uh, viss is particularly aimed at problems of scale,customizability and continuous usage. And then we're gonna use OpenAI as our, uh, LLM. Um, if you haven't heard of OpenAI,drop me a comment in the chat.

You'll be in the rare minority. Um, we're not gonna touch much on OpenAI and who they are. I think, uh, if you haven't heard of them yet,I would suggest Google as a good option. Um, and then we're gonna talk about LAMA Index, which is,uh, the orchestrator, the framework that we're gonna useto tie everything together. And this is a very rough architectureof how things are gonna work.

So we're gonna build rag,which is retrieval augmented generation, using AI agents. And the way we're gonna do that is kind of played out here,listed out here a little bit, right?So we're gonna take a request. Uh, so at, uh, query time, at the usage time,when you're gonna use these AI agents,you're gonna take a request and you're gonna send it tothe whole package. Uh, and in particular, it's gonna get routed to the LLM,which is, uh, represented here by OpenAI. And that LLM is gonna take your request,and it's gonna send it,and it's gonna look at the tools that it has,and then it's going invoke these tools to,uh, do something.

And in this case, it's gonna have these query toolsthat are, it's gonna invoke to query vis andbefore usage,what you're gonna do is you're gonna take your data,you're gonna embed it, and you're gonna put it into vis sothat it can be queried. So first, um, you're gonna take your data, embed it,put it into viss, create a tool out of it,and then at usage time, you're gonna send your requeststo the application, which is going to invoke someof these tools to be ableto query against the Vector database, do rag,and then give you a response. All right, so let's talk about rag. Oh, there's a question. What is the advantage of using OpenAI LLM plus LAMA Indexfor agents versus just using OpenAI agents?LAMA Index is something that you can useto do much more than just create agents.

It gives you the ability to route many things together. So we'll actually talk about this very briefly in the, um,section about Llama Index. Uh, so yes, uh, let me answer this again, uh, later. All right, so what is rag?Uh, if you have not heard of rag, uh,feel free to drop some questions in the chat. If you have heard of rag, um,then this will be a refresher for you.

So RAG is retrieval augmented generation. And basically what that means is it is somethingthat sounds exactly like what the words would imply. You're going to augment the generation of your responsesby doing some sort of retrieval. So a typical RAG architecture looks something like this. Uh, before you do anything,before you do rag, you're gonna take your dataand you're gonna embed it, and you're gonna put it into vis.

So the goal of RAG is to be able to access your datawith LLMs. And this is basically the first step isto just take your data so that,and put it into VUS so that you can use it. And then, um, you're gonna tie the LLMand your Vector database together using somesort of framework. So you actually could do this yourself with just promptsand queries and manually and things like that. But LAMA Index is one of those frameworksthat provides us a really nice way to tie it all together.

So what you're gonna do is you're gonna take your question,uh, once you have, you know, your data into active database,what you're gonna do is you're gonna take your question,you are gonna ask your question. The LLM is gonna take your questionand be like, okay, you know, what is this person asking?And then it's gonna take that,and it's gonna form that into something that it can, uh,that should be searched for. And then it's gonna take that search string,and it's gonna put that search string into, uh,the same embeddings model that you useto embed your data into viss. And it's gonna search Viss your Vector database. In this case, Viss is just the example Vector database.

It's gonna search your Vector database for that, uh,embedding for that vector,and it's gonna find the closest vectors,and then it's gonna return that, uh,those strings to the LLM. And the LLM is going to say, okay, cool. I've seen the context for this. Now I'm gonna structure this in a way that makes senseso I can answer my questionand then I will give a response to the user. So that's the basics behind a rag architecture.

You take your data, you put it into a vector database,you take your question, you ask it to the LLM, the LLM takesthat question, searches the vector databasefor relevant responses,and then gives you an answer back, okay?And that's basically all there is to rag. So if you have any questions about what I just said, uh,please feel free to, uh, drop it in the q and a. Um, now let's cover what AI agents are. So AI agents at the coreAI agents are just LLMs that can use tools. That's it.

Uh, you can think about tools as just functions. So an example of a very,very basic example would be something like calculators. So if you've been paying attention to the news about LLMs,you've probably seen that there's been a lot of, let's say,um, a lot of noise, a lot of commentary abouthow LLMs are particularly bad at math. So one example of how you could fixthat is you could create a functionor a set of functions like an addition function,a subtraction function, multiplication, you know,exponential, exponential, whatever. And you create these functions,and then you create like a string that says,Hey, this is what this function does.

This is the input that it takes, this is how it works,and you give the LLM access to invoke these functions,to execute these functions. And then when you ask the LLM question, like,what is five plus A, instead of doing the basic reasoningthat an LLM would do, it would instead execute some sortof function like, oh, well I know that I have this accessto addition, and since this person's asking meto add things, what I can do is add the, uh,invoke the addition functionand get my response and then give you an answer. So that's the basic idea behind it. Essentially, this architectureshows what, what it's gonna do. You take a request, you send to the LLM,the LLM will send it to some sort of tool set,it will call some sort of tool set.

It will go back to the, the answer fromthat will go back to the LLM. The LLM will then say, okay, now I can have my, now I knowwhat I need to form a response,and it will give you a response. Um, okay, there's a couple questions here. Uh, let me actually, let me wrap up this sectionand then I'll, I'll address any questions. So, um, right, the basic idea behind this isthat you can give LMS access to tools in the formof functions, executable functions,and that is how you can kind of patch some of the thingsthat LMS can't do or, or I guess,and extend the functionality of lms.

And, uh, there's a lot of work being doneto explore these age agentic workflows Now, uh,and some of the real life examples could be things such as,you know, being able to do RAGor being able to search the webor being able to even create other functions. So those are some examples of AI agents. Um, let me take the questions now. How can I use a spreadsheet with an LMto test this process and outcome?First, uh, I'm gonna need you to clarify what,what you mean by this question. Um, I don't, I'm not, I,I, I'm not entirely sure how a spreadsheet, likewhat you want to do with this spreadsheet.

So, uh, can you please clarify this question?And, uh, I'll try to answer that question about rag. Okay, so this goes back to the last slide or last section. How do LMS read the vectors?Do they get a map from the embedding model that is waitingfor it in the database?Uh, okay. Um, so let me clarify about how RAG works. So let's go back a couple slides here.

So I did not put this into my, um, visual here,and maybe that's, that's my fault here. I should put this visual that shows you like an embeddingmodel and where it should sit. So the LLMs, what they do is they don't read the vectors. The LLM takes the textand it tries to do some sort of reasoning on the text. So perhaps the question you asked is, um,how big is Seattle?Then the LLM, what it would do is it would be like, okay,I'm looking for Seattle and size or something like that,and it would vectorize, you know, Seattleor size of Seattle or something like that.

And it would search, it would, it would vectorizethat using the embedding model, it would send that textto the embedding model, and the embedding model wouldproduce a vector embedding. And then that vector embedding would bewhat it searched for in vis. And vis would then say, okay,here's the vector embedding model alongwith this embedding model. I have a sentence that is stored here that says,Seattle has a population of 750,000 people. And then it would turn that sentence to the LLM,and the LLM would then be able to takethat along with whatever context.

Like, oh, like the greater, the greater metropolitan areas,4 million people, et cetera, et cetera, et cetera. And then it would return that answer to you. What are tools, examples?So I just gave some tool examples in the, uh,AI agents as I was talking about them. For example, a calculator. Uh, you can also do, uh, rag with tools, right?You could use query engines as tools, um,other tools, examples.

You can use something that will do, like search the web,like a web scraperor, uh, call an API or anything like that. So tools are just, uh, imagine something that you haveto manually do, like, you know,create like some sort of Python function. Maybe it's like, Hey, like, I want to, I don't know, like,I wanna multiply these numbers, I wanna add these numbers,I wanna do rag, I want to, uh, I don't know,like create an image. Maybe I wanna call another, another modeland we'll create an image for me. Like that's an example of a tool.

Are there any models that are bestto make embeddings for rag?Oh, this is back to rag. Okay. Or are there better models for embeddingsor just use OpenAI or there?Okay, uh, yes, great question. Um, the embedding model that you're gonna use,that you're gonna want to use is gonna be dependent onwhat you want to do in a basic sense. You can just use open ai.

But, uh, I would say that if you're gonna, that's likeprobably really good for your POC justto get something up and running really quickly. But if you really want to do something customizablethat you can put into production, what I would suggest isthat you go to hugging faceand you test out a bunch of their models,their open source models, um,and you find ones that are best for your use case. So for example, if you're gonna be doing text,you're probably gonna want somethingfrom sentence transformers. If you're gonna be doing images,you probably want something from, uh, like a RESNET 50or some sort of vision transformer or something like that. So it's gonna depend, and I wouldn't say there arespecifically any better or worse models just for embeddingsto rag, because it's gonnadepend on a lot of what you're doing.

So some things you are doing might be, um,what is it called, uh, vertical specific. So maybe there are some models that are for financeor for legal or for healthcare or something like that. And you're gonna want those specific models. And you can either fine tune your embeddings model for that,or you probably want to do,you probably want a specific embedding model,and you probably wanna fine tune LLM for that. Um, so yes, it's just gonna be dependent on your use case.

Can I use a spreadsheet as a toolfor rag having a product list with prices?If the LLM needs to answer a question on pricingfor a product, would it then have the answer?Uh, yes, you can do that. It would just, you would need to format it correctly. So, uh, for example, you would need to give the, uh,the LLM, uh, you have to give instructions on how to searchthat spreadsheet, or you'd have to have some sort of, um,some sort of, uh, data converter that can takethat spreadsheet and turn it into a, uh,JSON or something like that. I mean, you could do it with CSVs,but CSVs have been notoriously difficult to work withfor LLMs because they do not follow the tra. I mean, unless you're using a fine tuned LMor a fine tuned embedding model because they do not followthe traditional sentence, um, structure of English or,or any other language for that matter.

Suppose I have a PDF on which I wantto perform rag operations on. Now I have some other text file which saysthat if you find this in A PDF, this is a red flagor a green flag,how do I make sure the rag follows this text fileand find these red flag or green flags within the PDF?I'm not sure I fully understand this question. Um, here's my interpretation. If I'm interpreting it wrong,you can feel free to update your question. So I'm guessing that you want to, you have a setof PDFs and you wanna search these PDFs,and you also want to give instructions to the LLM that says,if you find something in this PDF,perhaps you find the word, um, I don't know, like swag,then it's a green flag, which meansthat we want this PDF file.

And if you find the word not swagor you know, swag list, then maybe it's a red flagand we want to ignore this PDF file. Um, essentially, I don't know why you need a text file,but you would probably just pass this in through the prompt. Oh, more rag questions. Wow, there's a lot of questions about RAG today, guys. This is an AI agent's webinar.

Um, okay, I'm kidding. I I can answer your questions about RAG as well. Uh, rag specific question. Let's say we have a hundred embedded docs with metadatathat states which file each embedding comes from,and there are 10 unique files,so 10 embeddings per document. Is there a way to query the top K,let's say three embeddings per unique file?And so in this case, would it return 30 chunks total?You know, that's a good question.

Um, there is a way to get unique filesand then to get the top three from the unique files. So viss just,and I'm not sure if what other vector databases can do this,but I can tell you that Viss can do this. Um, you can call this new function called group buy,which if you follow me on LinkedIn,you've probably seen me like rage about this like a couplemonths ago because I was like, this is so dumb. But it actually makes sensewhen you think about unique files. And I think it's really interestingthat you've asked this question, which really, uh, validateswhat the team has built on it.

Um, so you can get unique files from vis,and then what you can do is you can actually get the chunksfor those files and, uh, the, and then sort those chunks. So, um, the answer would be yes,and it's gonna require more customization than, uh,a typical, uh, rag application. Oh, another question. Can we do rags on order datatable and how to keep it up to date?I don't understand this question. Okay.

If I want my agent to be able to handle the casesof forecasting, can I do it?I'm gonna need you to answer this question. It's a, oh, uh,can you do RAD for a live stream data?Uh, yes. I don't know why you'd wanna do that. Uh, it's gonna be, I mean, like gen ai,like L LMS are not real stream,are not real time LLMs are slow, they're not real time. So you can pipe data into LLMs in real time,but it's not gonna be helpful for youbecause the LM itself is gonna be your pi is is gonna beyour, um, bottleneck.

This is not really a question, but here's a comment. Uh, if I put too many rules into the system prompt,won't it lead to a loss in the middle problemor LM losing context?I'm worried there's a limit on system prompts. I mean, yeah, but what you can do is you cantag metadata in your vector databaseand then just do metadata filtering. So when you do your query,you can filter on specific metadataand then you can implement your rules that way. Oh, wow.

Oh, there's another question. Wow. I'm getting a lot of questions about RAG already. Okay, let's assume I'm creating a conversational ERP system. I want my order data to be availablethrough conversational interfaces.

How could I go about it? Um, you can just,you can just tag your data with a timestampand then when you search, you can filter on timestamps. Great. All right, we're gonna skip through that. We're done with this. Okay.

We'll cover a little bit about LAMA Index. Uh, so what is LAMA index?Um, LAMA index is a framework for building LM applications. If you've heard of Lang Chain, it's somewhat similar. Lang Chain is focused on how you orchestrateand chain together LLM inputs and outputs. LAMA index is focused on how you do better data retrievaland put that into LLMs.

Uh, LAMA Index also has integrationswith many popular tools, including vis, um,other vector databases, other embedding tools, uh,other LLMs, all of these different things. Okay, so what is Novus Mil vs. Is distributed vector database. That's optimiz for large high scale use cases. So basically if you're buildingwith more than a billion vectors,Novus is gonna be the best vector database for you.

Um, we have 50 plus projectswith over a billion vectors in production,and actually that was last year,so it might be more than that now. Uh, I would bet it probably is more than that now. Um, and a couple other things the VUS does really well isthat it has this really highly customizable set of, uh,indexes and distance metrics. If you have more questions about indexesand distance metrics, I have a lot of videos about that,a lot of articles about that. But essentially, indexes are ways that youorganize your data and how you're gonna search your data.

And distance metrics are ways that you, uh, comparehow far apart your data is. So there's a bunch of different ways to do this,and it's actually really, really importantto your application that you use the right nexusand distance metrics for your use case,because there are trade-offs for each of these. Um, and finally, mil, this has a lotof enterprise ready features, you know,role-based access control, multi-tenancy,all the really boring stuff that nobody really wantsto hear about unless you do want to hear about it. And, but you, you should probablycome to another webinar for that. Okay.

So let's also take a look atwhat data in your vector database actually looks like,and then, uh,we'll take a look at the architecture and then we'll go to the demo. So this is what vector data looks like. This is an entry, so you need an IDand you need an embedding, so your ID can be customizableand your embedding is just gonna be a series of numbers. So this is what a vector looks like. Um, I get a lot of questions on what a vector looks like.

Here it is, it's a bunch of numbers. That's it. Okay. And now, um, this rest of it is what we call metadata. And metadata is essentially datathat you tag onto your vectorand your ID so that you can use it at the time of queryingor that you can filter withit like I was mentioning earlier.

Okay, so let's also take a brief, uh,look at Mil this's architecture. So mils is a distributed system,and, um, it is a,the input output is loosely coupled in a sense, right?So what we do is we model mils as a pub subsystemthat uses a stateful coordinator service that willspin up and down, um, stateless nodes. And probably the most important thing for youto understand in terms of vector databases in generaland scaling, is why would you have a,um, pub sub system?Why would you have something like this?So the main reason is you wantto have consistency across replicasand across instances and right. Consistency and recons consistency are two different thingsthat are both incredibly difficult to do at scale. And Viss automatically handles a lot of this for youthrough four consistency levelsthat you can choose based on what you need.

So this is once again, in response to the personwho asks about the real time data streaming. You can stream data in real time into viss. We use Kafka Pulsar for that,and it's gonna depend on your consistency level whenand where you can access that. So you can actually have everything read after write. So that's called strong consistency all the way downto something loose, what we call eventual consistencythat just does everything in theorder, the system receives it.

Some other things to be awareof are this separation of these nodes here. So we have a query node that handles how queries are done. We have a data node that handles data ingestion,and there's an index node which handleshow do I organize my data?So why are these nodes separate?Why do we have a separate node for each of these different,uh, things, each of these different functionalitiesof the Vector database?The main reason we have this isbecause you are never going to see all threeof these scale at the same time. You're never gonna have something where, Hey,I'm ingesting like, you know, 300 million documentsand I'm also querying against 300 million documents. Typically, you'll have something that scales, uh, differentfor each, each one.

And, um, you know, this is, this is particularly importantbecause this helps you optimize your resources, keeps your,uh, costs down,and also, um, you know,just lets you have more control over the functionalityof your vector database. Um, something that isn't on the architecture,but also important to understandand important, uh,for people whose data is gonna be changing often isthat VUS indexes your data into these segments,and the default size is 512 megabytes. But as you get bigger and more, moreand more data, you're probably wanna makethat segment size larger. And the idea behind it is basically that if you query on,if you index and query on these small segments,you're gonna get much, much, uh, faster resultsthen if you were to index on a huge segment. So if you just think about it for a second, the big oof parallel searching 200 segments of the same sizeis gonna be, um, smaller than the big O of saying like, Hey,I have this huge, you know, huge amount of data that I wantto go through linearly, right?Um, and then the other thing is thatas you change your data, as you update your data,as you add more data, then you're gonna wantto do this thing called re-indexing.

If you were to build just the proof of conceptand you just wanted to put data inand say, Hey, I just want you to index over this data,then none of this matters. But in production, in real life usage,the reality is your data changes a lot, quite oftenby the minute, by the hour, by the day, by the week,by the month, whatever, your data's always changingand you're always gonna need to update your data. So the reality is that as you update this data,it becomes incredibly hard to index it wellunless you have a system that handles that for you. And so that's something that Viss does as well. And now we're gonna get into, uh, the demo.

So before we get into this, um,if you have enjoyed this presentation so far,I'm gonna ask you to do me a favor and scan that QR codeand go give Vis a star. Uh, my boss will be very happy about this,and I'll also pause here to answer some questions. If I want my agent to handle the casesof forecasting, can it do so?Um, based on the order of data?Wait, you asked this question already,and the answer is yes. Yes, you can. How do I structure different formatsof data while performing, embedding?Suppose my PDF contains text data, tabular data.

How can the embedding be performed?What should your, what, what, what do you suggest?Um, so PDFs specifically, I'm gonna refer youto this thing called Llama Parse, uh,tell the Llama Index people I sent you. Um, llama Parse is their specific like PDF parsing tool,and it's really fast and it works really well. Um, I like it for PDFs. Um, I'll probably have some projects that showcase it later,but it's, it's a, uh, it, it's somethingthat helps you parse PDFsand you don't have to worry about any of this stuff. Basically, can vis solve my problems?Yes, VIS will solve all your problems for you.

You'll never have to think about anything again,VIS can do everything for you. No, I'm kidding. But, uh, I don't,I don't understand this question. Um, can you compare LAMA Indexand Lane Chain a bit more in depth?Should we choose LAMA index over lanechain to work with Novus?These are fundamentally different libraries. So yes, they're in the same category of library.

They're both to help you orchestrateand create LLM applications,but fundamentally they have different focuses. And I would suggest you use the correctlibrary for the correct focus. So if you want to do something that's gonna re, that's goingto involve a lot of retrieval, a lot of datafor your LLM, a lot of data that goes inand out of your LLM, then you should use LAMA Index. If you're gonna use somethingthat's gonna be a bit more about how your LLM functionsand what you do with the inputsand outputs of your LLM,then you're gonna wanna use lane chain. So these two libraries are different librariesthat can perform a lot of the same functionality,but are actually organized as completely different.

Um, they have different bases of organization. And so depending on what you wanna do,you should choose your library based on what you wanna do. What is the main benefit of Novus versus, okay?So relational databases, for example, don't dowhat Vector databases do. And actually vector data, uh,I'm gonna go back a slide here. So vector databases are actually, uh,vector database is a bad term.

It's just what everybody tends to call them,but what they actually are is a compute engine. So if you want to, you can build a compute engine on topof your relational database, but why do, why go through allthat trouble when you're gonna run into performance issues?Because then you won't have, you know, the ability to scale,the ability to spin in multiple instances. You won't have proxies, you won't have the ability toseparate your concerns and all these different things. Um, so look, if you have a thousand pieces of data,it doesn't matter what database you use, it doesn't,it just, it it is totally irrelevant. Um, but when you wanna go into productionand you have something that's gonna scale,that's when you want somethingthat's gonna be purpose built for your use case.

And Novus is primarily built to serve that use caseof people who are really gonna take things into productionand want to have huge, huge amountsof data they're gonna be touching. That's when Novus is gonna be useful for you. Um, so if you're just playing around with it,I would suggest, you know, it doesn't really matter. Maybe play with Novus or Fun if you'd like. Can Asians perform multiple toolswith rag operations in between tools?Yes.

You just have to structure it that way. Can vis support multiple embeddings in the same index?Uh, no, you cannot do thatbecause embeddings can only becompared if they're the same length. If embeddings are not the same length,they cannot be compared mathematically. Uh, I want to have an LLMand custom data generated by the user. Is RAG good for that? Uh, yes.

Uh, why would you use VUS insteadof Pine Cone Quadrant or wate?Uh, I think I just answered this question earlier. VUS is, uh, and Viss is the only one that's distributed. Um, really you can use any of these if you're, like I said,you know, if you're playing with small amounts of data,use whatever you want, it doesn't matter. You can create your own vector database,vector search engine and use that. Like, it's, it really doesn't matter when you're workingwith small amounts of data, it only becomes somethingthat matters when you are looking to build somethingthat's gonna go into productionand, uh, that's gonna be enterprise ready.

Great. Okay, so let's just ref, let's give a refresher onwhat we're gonna look at and thenwe're gonna get into the demo. So remember, we're gonna build RAG with an AI agent. I know we just asked a lot of questions about Rag. I just answered a lot of people's questions about RAGand like briefly touched on AIagents and how you can things work.

But in this specific example,what we're gonna do is we're gonna build an AI agentthat does RAG for us, okay?And we're gonna use OpenAI because it's easy,because it's simple, because it's there. And we're gonna use some PDFs. So those of you who are asking about PDFs,very relevant questions. We're gonna be using PDFs, we're gonna be using the 10 Ksof Uber and Lyft. Okay? So let's get to it.

All right, so here we're gonna lookat this. So I will, here, let me drop the link to this, um, belowso you guys can get this. Uh,so this is oneof the notebooks in my AI agent's cookbooks, uh, repo. And essentially what we're gonna do here,like I said earlier, is we're gonna create an AI agentthat does rag, and we're gonna do rag on the PDFs of, uh,the 10 Ks of Uber and Lyft. Uh, oh, before you jump into the demo,can you give us a light recap about AI agents?Is it just executable functions? The answer is yes.

Basically, that's how it works. It's a little bit more complicated than that,but you'll see in the demo about how that works. Alright, so let's jump into it. The first thing we're gonna do here is we're going to,so fir first thing we're gonna do here iswe're gonna download a bunch of data. And then actually before we do this,we're also gonna run Docker compose up dash Dand you can get the Docker, uh, compose filefor this in the repo.

I've dropped the link in the chat, so you can go there, getthat, uh, Docker compose file, pull that, and do this. So I've already put my, I've already run my Docker, uh,compose files. My, uh, my container is already running,so I'm not gonna run this piece. Uh, all the data is also all in the repost,so you actually don't need to run this stuff,but if you wanna download it yourself,you wanna do it from scratch,you know, you can do all this stuff. So the core imports we're gonna look at hereare the simple Directory reader, which is Lama Index's wayof just, you know, reading in a directory.

Um, and then Vector Store Index is their, you know, way of,uh, accessing vector databases. And then storage context is the way that you passaround different, um, uh, different ways to store data. It's the, the context for storage in LAMA Index. And then we're also gonna get the query engine tool. So, uh, readdressing someone's question about executablefunctions inside of LAMA Index, as well asinside of Blank Chain, by the way.

Um, the way that you access tools, the waythat you access these functions is based on a specific typeof, uh, class objects, whatever that they have made. And tool metadata basically tells the LLM gives the LLM somesort of metadata to your tool so that you understand, uh, sothat it understands how to access and how you use your tool. And then there's Vis Vector Store, which is gonna bellama index's way of, of interacting with vis. So this next section is probably gonna be take the longest. And basically what we're doing here iswe're getting all the data and we're vectorizing itand we're putting it into a Vector database.

So let's step through this step by step, right?The step here is we're goingto call the Simple Directory Reader,and we're gonna use that to read in these files. And so in this case, we've got a 10 K file from Lyft from2021, as well as a 10 K file from Uber in 2021. And we're gonna use the Simple Directory Readerto read in both these files as sets of documents. And so we've called these Lyft Docs and Uber Docs,and now we're gonna build indexes. And so in this case, we're gonna create onefor Lyft and one for Uber.

And here you'll see that I passed in 1536 as the dimensiondimension just refers to the length of the vector. And this is gonna vary depending on your embedding model. In this case, because we're using the default embeddingmodel of OpenAI. OpenAI has a dimensionality of 1536. So we're gonna pass in 1536 as our dimension.

Next, we're gonna name our collection. And so in this case, I've just named the Lyft Collection,Lyfts and the Uber Collection. Uber and Overwrite equals true just means, Hey,if I run the scan, I want youto overwrite the database base. And now what we're gonna do is we're gonna give itthe storage context, right?So here we've created these access to, uh, these connectionsto the Vector database, and now we're goingto use a storage context. So we can pass this connection around inside of LAMA Index,and we're just gonna say, Hey, we wantto pass this Vector store,and the Vector Store we're applying is Lyft or Uber.

And then next we're gonna create a Vector Store index,which is the way that Llama Index interacts with these, um,these vector stores and the way that they're indexed. So Vector stores can create multiple indexes for multiple,you can create multiple indexesfor multiple types of vectors. So VIS has recently introduced this thing calledMulti-vector search, which allows youto have multiple vectors inside of the same entry. You can't compare these vectors to each other,like you can't compare Vector one and Vector two,but you can compare Vector two across, uh,a set of, uh, vectors. And in that case, that would be a different vector indexthan Vector one.

And now what we'll do is we'll pass in these Lyft docs,these Uber docs, and we will create a, um, uh, uh,the index on these. And so you'll notice herethat all we're doing is creating the connectionsto the Vector Vector store to Novis,and then here we're actually creating the embeddings forthe documents and inserting them into, uh, vis. And so this step here is actually the stepthat's taking this function so long to run. And then we're just gonna persistthis into certain directories. So this is, uh, optional,but essentially what this allows youto do is this allows you to say, Hey, if I come backand I want to work with this again,I can load it from these persistent directories.

Now what we wanna do is we want to take these indexesand we want to turn them into query engines. So a query engine is just, uh, a way to interactwith a Vector db, uh, and uh, search that Vector db. And the main thing we're gonna apply here is thesimilarity top K. And what this means isthat we're gonna get our results back. Um, uh, the top three results back here, right?So similarity top K equals three,and that's, that's all this this means, okay?And now this is where we start creating the tools.

This is how we pass in the ability to execute these, um,uh, uh, uh, execute these tools. And so what we do is we create theseand we turn them into query engine tools,and we give this thing a lift engine, which basically just,we called, uh, this query engine for Lyft as the Lyft engineand the Uber one Uber engine. And so we need to pass in the query enginefor the query engine tool to have the abilityto query the vector database. And then we also need to pass in some metadata. And the metadata here is actually how the LLM understandshow to use this tool.

And so, you know, this is basically in the sense of yes,this is just a a, a a executing a function,but the name, we give it a name so that it knows like, oh,like this is why I should call this. And then we give a description,and in this case we say provides information about listfinancials for the year 2021,and use a detailed plain text question as input to the tool. So in this description, we've told the LLMwhat the tool does and how it can use the tool. Okay? So this is importantbecause this is basically LLMs are doing reasoning on text,and this is how it's able to understand what tools to useas an AI agent. Did I already run this? Okay, I already ran this.

Okay, now we're gonna create the actual agent. So in order to create the actual agent,we're gonna import this thing called the React agent,and we're gonna import OpenAI. We've actually already used OpenAI in the batch database,but we're gonna import OpenAI as the LLMand the React agent is just reasoning action react. Okay? And then here we're gonna import the model. So we're just gonna say, Hey, the LM that we wantto use is GPT-3 0.

5, uh, 3. 5 turbo oh 6 1 3. This is just a version of GPT-3 0. 5. If you have access to GPT-4, you can use GPT-4,and then we'll create the agent,which is gonna be the React agent,and we're gonna create it from a set of tools.

And so we created this list of tools earlierthat we called query engine tools. So if you wanna do things other than rag,if you wanna do things other than querying,you can pass in other tools. And then we're gonna give an LLMand we're gonna just say verbose. We're gonna do this in a verbose manner, which meansthat when we call the LLM,it's gonna input output its reasoning. So we're gonna call the LLMand we're gonna hope that it gives us lift's revenuegrowth in 2021.

Okay? Great. And so this is a function of the verbosity. It says, I can use Lift 10 K toolto find information about lift's revenue growth in 2021. And the action it's gonna take is the Lift 10 K. And the input to that action is the question,what was lift's revenue growth in 2021?And then it's gonna give you the observationthat it makes here lift's revenue growth,increased lift's revenue increaseby 36% in 2021 compared to the prior year.

And then it takes that observation,it goes back to the LLM and reasons. I have the information I need to answer the question. Here's the answer. List revenue growth,what in 2021 was 36%. Okay? So that is the, that is the,that is the base of the demo.

That's really it. Um, I understand that, uh, you know,there was a lot of code here to kind of getthrough, uh, rather quickly. So if you have any questions about that,we can answer those questions. I see there's also two questions in the, uh, QA alreadythat I will go ahead and answerany tricks on how to get relevant entries from vis toquestions like what is the biggest, smallest,et cetera questions where an unknown number of entries haveto be compared to get the right results. Um, in this case, you probably wouldn't want to just, uh,so, so vis what it does is it's a vector database.

And vector databases are ways to find, uh,semantically similar data. And so in this case, what you are askingfor is actually not semantic similarity,but something that can be done, um, through comparisons,which is much easier donethrough something like a Python script. However, in the way to do this is to use metadata filteringwhere you can say, hey, like maybe like, uh,maybe you want the longest entry or the shortest entry,or you want the entry with the most number of, uh,commenters or something like that. In that case, what you would do is you would just use ametadata filter to, um, to, to, to like, you know,set like a, a a boundaryand then you would use some sort of re-ranking mechanismto rank these, uh, in order of, you know, whatever. So, no, the biggest comment about the longest comment about,you know, I don't, I don't know, like elephants.

And in that case, like, you know,what you would do is you would search for elephantsand you would get back a bunch of results,and then you would find the longest result. Something like that. Can you use a differentembedding, even if you're working with an open AI LLM?Yes, you can. And can you explain again, multi-vector?Okay, multi-vector is if you have multiple waysto represent an objector an entry, then you could have multiple vectorsthat represent that entry. So for example, let's say you have three different embeddingmodels and they have different dimensionalities,like they have 3 84, 7 68, 15, 36.

What you could do is you could have all threeof these embeddings stored,and then when you want to compare,you could compare based off of any of them,but you have to ensure that you compare off of the samevector embedding each time. So you can't compare the 360 8, uh, 3 84 to the 7 68or the 1536,but you can compare a different 1536dimensionality vectors to each other. Uh, where will I be able to find this recording?Saachi, this one's for you. We'll send you the recording, uh, if you signed up, uh,through email, and it'll also be on YouTube, uh, shortlyafter the webinar is over. How the tools work under the hood does a usage of a toolby the LLM means that the LM sends back an answerwith the name of the tool and the search text,and then the library handles that on my machineand sends the answer back to the LLM.

Um, okay, wait, let me, lemme reread this. Uh, so, um,the LLMbasically sends the inputsinto the functionand then gets the output out of the function,the function ex like, I think,I think the question you're asking here iswhere does the function execute?And the function will execute locally either serverside or client side. I I guess that's not locally. It will execute wherever the function is,wherever the function is hosted. So if you're running it on your machine,it will execute on your machine.

Yes. Would you have a separate index for each embedding modelfor the multi-vector scenario in one vector databaseif you have a separate index for each embedding model?Yes. Yes, you would, yes,because, uh, indexes are ways that you access the dataand, um, and,and you can only create indexes on vector embeddingsof the same size because you need to be able to, uh,mathemat of the same dimensionalitybecause you need to be ableto mathematically compare, uh, the vectors. And yes, you could all sort it into one vector database,at least you can do it with vis. Once again, I can't comment onwhat other vector databases can do,uh, something about low code.

Um, so I don't, I don't know, like, yeah,like you can use low-codeand Zillow's cloud, like sign up for Zillow's cloudand just give it a bunch of, uh,up upload documents. There's pipelines. Now Zillow's cloud is this thing called pipelines. All you gotta do is you upload your documents,it'll embed it for you,it'll put it into extra database for you. So if you don't know how to program, um, a,I would consider learning how to program.

I don't know if you saw the thing about Devon, uh,but Devon said like, they indexed it. They were like, oh, we solved like 13% of issues. And then like recently people foundout that that was like not true. So, uh, programming is not going awayanytime in the near future. And if you want to be able to build custom things,you're gonna have to learn how to program.

By the way. Um, uh, yeah, that, that, that's really,that's, this is my really, this is,this is my only real comment about this. No code or low code is just someone else's code. So if you want somethingcustomizable, you're gonna have to make it yourself. Do you think using multiple query engines is a goodreplacement for the Viss Group five function group buyfunction you had mentioned earlier, um,these are fundamentally different use cases.

Uh, so multiple query engines is if you wantto query multiple databases,and group buy is if you want unique results from the same,uh, set of, uh, uh, the unique, sorry,if you want unique documents from the same setof documents stored in the, in,in one vector, uh, collection. Um, so I guess my answer is no. Uh, these are different use cases. Cool. I think we are ableto wrap this up in a good amount of time.

It looks like, uh, we've been able to address allof the questions and we still have five minutes leftand we got to run through the entire, uh, code. So, um, I'm happy to wrap this up here. If there's any other questions, I think we can give it like,you know, 30 seconds and, uh, you know,speak now or forever hold your piece. Yeah. Anyone have any last minute questions to ask you, Jen,before we wrap this webinar?Just wait a few, few seconds here.

Also, again, this, uh, this recording will be availableto you, uh, shortlyafter the webinar, so stay tuned for that. Looks like you have one question. I do show your LinkedIn QR code again. Uh, yes, I can do that. Um,We also have a Discord channel.

Um, I will put that in the chat againfor anyone who missed it earlier. Okay, there it is. You can scan thatand you can find me on LinkedIn. Can we contact you for further questions later? Yes. Go find me on LinkedIn.

You know, comment tag, make a post, tag me in the post,I can answer your questions. If you were to take a devil's advocate for using Vector DBSfor Efficient Rag, what would you say?Uh, you can't do RAG without Vector databases. So I don't know how to answer this question to be honest. Uh, I would say like, I would say it's a, I would say, uh,yeah, I, I really, I really don't know, like the whole pointof, like, the whole conceptof RAG relies on you having got your databases. Okay, I think that is a wrap.

Yep. Oh, nevermind. There's another question. Okay. For the tools, could an a, could an agent read an emailand synthesize information from itand then go search the web and an action?Yes.

Yes, it can. You'll have to have multiple tools, eachof which does one of these tools. Each of these does one of these things, for example,an email reading tooland a, uh, web search tool and an action. And then you could have all of these chained together. I guess you have it all in one function.

Um, but, uh, I mean, the answer is yes. It just depends. You can do it in many ways. All right. Before we say this is a wrap, again,I'm gonna let their, I'm gonna let someone else finishtyping their, their questions and, uh, oh, there.

Yeah. Here's another question. Uh, if I had all the unique files in separate indexes,so I wouldn't be able to use a group bi on one index,but I want to go over Wait, wait, wait, wait. What? But I wanna be going over, um, multiple indexes. Could this multiple query engine workto check multiple documents instead of using group files?Okay.

Wait, wait. Unique filesand separate indexes so I wouldn't be using group?Um, yes, yes, yes, yes,you could do that and you would probably just wantto pass different indexes to each, uh, query engine. Okay. Any last minute questions here?Okay, this looks like we're good. Oh, uh, uh, what's this?Oh, okay.

Okay. Oh, okay. This was just a thank you for the question. Well, thank you Alex. Okay.

Thank you all for joining us today. Have a good rest of your day. Bye.

From SF with Love: AI Agents and How to Build Them

AI Assistant