Events
Build hallucination-free semantic search applications with OriginTrail and Milvus

Webinar

Build hallucination-free semantic search applications with OriginTrail and Milvus

Zilliz Webinar - Zoom

Join the Webinar

About the Session

One of the most important things about training LLMs is ensuring that the data and information is reliable. OriginTrail is building a Decentralized Knowledge Graph with semantic search on Milvus to address this issue. This webinar will introduce how these technologies can be used to develop more reliable AI systems.

We’ll focus on using Trusted Knowledge Assets and Extractive Question Answering, all powered by the Milvus vector database and OriginTrail Decentralized Knowledge Graph.

You'll learn:

How to create and own trusted Knowledge Assets using OriginTrail.
Techniques to enable semantic search of Knowledge Assets by using Milvus.
An overview of tools and software development kits (SDKs) for developers eager to create their own reliable knowledge applications.

View presentation slides

Transcript

Good morning, good afternoon, and good evening everyone. Thank you so much for joining us for today's session,building Hallucination Free Semantics Search Applications with Origin Trail andnovis. I'm Yu Eugene Tang, and I'm a member of the team here at Zillow. I'll cover a few housekeeping items and then we'll get right on into thesession. First.

This webinar is being recorded,so if you have to drop off at any point,you will get access to the on-demand version within a few days. If you have questions,feel free to paste them into the q and a tool at the bottom of your screen. And for upcoming events,we have one on September 21st,that is gonna be by me about doing citations andattributions with, uh, retrieval augmented generation. And then on the 28th,we'll have the, uh,co-founder and C t O of Cradle AI come present about VISin action. That's all for housekeeping.

Um,today I'm pleased to introduce today's session, bill Hallucination,free Semantic Search Applications with Origin Trail and vis,and our guest speaker Emir Raic Emir is thefounder and c t O of Trace Labs core developers of Origin TrailBrand me has been the architect and creator of the Origin Trail,decentralized Knowledge Graph,D K G designed for making AI grade knowledge assets discoverable andverifiable across the web brand. Meir has been particularly focused on networking problems throughout his career,spanning decentralized networks. Um, apart from D K G,he has also worked on Ethereum and Polka dot, uh, semantic Networks,and even mobile networks,being a strong advocate of open source and permissionless systems,brand mirrors, active in standards development,working groups within GS one and various blockchain communities. A member of the Serbian Entrepreneurs Collective and an alumni of H PharmAccelerator. Welcome brand, Amir.

Thanks a lot, you, Jim. Uh, hello everybody. Good morning and,and afternoon or evening, wherever you are, uh, around the world. Um, and yeah, happy,really happy to be here and present what we've been working on lately, uh,within Origin Trail in, uh, as well conjunction with vis. So, uh,I hope this presentation, uh, is interesting to you.

I suppose it's a bit of a different than,than what usually would see on Zillow's webinars because of the, the trusted,uh, blockchain and launch graph angle. So I'll dive a little bit deeper into how that works, uh,but I'll try to stay high level so it doesn't get too complicated. Um,but yeah, feel free to to post any questions in the,the q and a and I'll try to address those, uh, together with Eugene,I suppose at the end. Um, so yeah, I guess that we,we can get started by now pretty much, I guess, uh, the focus in here. So we're gonna be talking about hallucination,free semantic search apps with Origin Trail and bu.

And, um,I'm gonna actually start by introducing the,the high level problem here and why, why we're at all, um, in this space. And, um,just gonna highlight the person I'm sure pretty much everybody knows on this,uh, call, uh, is called the Godfather of ai, Jeffrey Hinton,who recently left Google recently last half a year ago or something like that,uh, because of the dangers ahead, um, I'm sure everybody's all aware of, um,the whole boom that happened with the large language model models lately. And,um,he's accredited as one of the people who basically was pioneering this idea. Um, and now seems to be that, that he's very worried about what can happen, uh,and, and very big problem be, um, here being that, um, well,obviously LMS have this weird notion of hallucination or really just like sortof guessing something, uh, showing,shooting some output that might not necessarily be at all correct or, um,um, somehow verifiable and, um, that's already taking place in,in the world. So we, we are living in this sort of post, post-truth world,right? Where there's a lot of, uh, people generally, you know, uh,labeling things with, uh, fake news and,and all these kind of new new labels to,to things that we, we didn't sort of hear, uh, many years ago before.

And now with ai, you know,that that can hallucinate and is becoming a bigger part of content creationaround the world. Um, the problem just becomes bigger. And, um, there's been,for example, a case of certain lawyer, uh,who did legal research with Chad g p t,probably it wasn't ill-intentioned, rather likely. It was, uh,the person was in a rush. We're all in a rush today, right? So, uh,and he ended up citing a couple of non-existing, uh,filings as long as it's the cases, uh, in the filing.

So ended up getting in trouble with the, with the court himself, um,because Chad g p t just invented a bunch of these, um, quotations that,that were leading to nowhere nonexistent. Um, so it, it's,this is just one of the real tangible examples of this happening. Um,but not just that it's happening, it's also a big blocker for adoption of,um, ai, especially within enterprises. And that's what we're hearing from enterprise partners of ours, uh,where we actually heard quite a bit of them being interested in usingtechnologies like lms. Uh, but on the other hand, being quite, um,blocked by this notion of, you know, well, it can just spit out anything.

So,uh, from an enterprise grade perspective, uh, this is still not something that,that, uh, is, is really easy to tackle. So, um,I suppose this, this whole problem statement is quite,quite understandable by now, but for everybody, and, um,essentially we see that this overflow of misinformation is wheretrusted data really becomes the cornerstone for all humanity really,but especially for security, uh, or systems for really further prosperity. Um, uh,so convergence towards some more trusted and reliable systems rather di thandiverging into some chaos. Um, and therefore,at Origin 12, we actually think that it's, it's, um,another knowledge revolution happening right now that we've seen three so far. The first one was kickstarted with the printing press.

We had the problem of knowledge,scarcity knowledge was basically managed well in Europe, mostly within like,you know, entities like churches and rare libraries around the world. And,um, it was, it was very concentrated in, in, in a small number of places. Um, then we get the printing press, all of a sudden,replication comes as a solution to scarcity and voila, the world changes. And, and literally a renaissance happens, and the world becomes, uh,totally different and, and more advanced. And then we have obviously the,the big revolution, which was, uh, which happened with the internet,which solved the fragmentation of knowledge, which was available around,but with connectivity as a solution, we, we got networks,we got the worldwide web.

Um, and then as I was explained today, we,we have web two and web three that advanced that concept to, to, uh,the next level where we, we basically solve this, this second problem,the problem fragmentation and, and got, um,an amazing knowledge revolution happening over the course of the past severaldecades. And right now, we're in the middle of a new one. Um, and this new,new revolution driven, obviously, by AI,has the problem that that immediately puts into, in front of everybody's face,which is this trust problem. How do we trust, uh,the information that we're seeing, and, um,how can we verify that indeed what we're getting, what we're, what we've, uh,somehow, um, reached,discovered is indeed something that we can trust. And, um,we believe that, uh, the solution to this problem is really decentralized ai.

Um, at Origin Trail, we've been building technologies, uh, based on, uh,decentralized networks for, uh, about seven years now. We actually started way back before that,building centralized proprietary systems. And then around 2017,uh, we realized that, um, we, we should really, uh,expose this technology to as many people as possible. So we went from centralized proprietary to decentralized and opensource. So today, I'll, I'll explain a bit more about Origin Trail, but, um, in,in the context of this grander scheme of things,the knowledge revolution we're in today,we believe is going to essentially merge in, uh,having decentralized technologies support, um, the, the,the exploding amazing world of ai, uh,as sort of a kind of a trusted infrastructure underneath.

So essentially,at Origin Trail, we like to say that knowledge is a new asset class, um,and therefore, uh, origin Trail is building technology to let you seize it. Um,so I'll explain today, um, the,the concepts behind Origin Trail and how it ties in with, uh,with VUS in particular, uh, as well as, uh, with ai. And I'll show a small demo application just to illustrate how this works,um, in, in actually a couple of different cases. Uh, but before that, I'll,I'll have to make a bit of an introduction to Origin Trail so that you know it,it's actually possible to follow what I mean by trusted AI infrastructure. Um, just as a note, origin Trail is not some, um,you know, laboratory technology.

Uh,it's actually been used by many for many years by businesses around the world. Um, some of the examples are seen here on the screen,I won't go too deep into them. If you're interested,you can see a lot more information on our website,but I'll just highlight for one is the Trusted Factory case where, uh,British Standards Institution, together with, uh, the most,the biggest US retailers such as Walmart, target, home Depot,have built a system on top of Origin Trail to share security audits or factoriesaround the world, covering actually 40% of US imports. So all of this built on Origin Trail, decentralized, large graph, um,rail travel safety is another case where we work, uh,together with the Swiss Railway company for many years. We're basically tracking all kinds of things within their railway supply chain,including trains across Europe, um, minimizing really the, the,the safety risk.

Um, and you probably, if,if you're in the US or, or where I come from in Serbia,we had a couple of rail incidents in the last year or two that were reallycatastrophic and both for the environment and people. So, um,these types of cases is where trusted knowledge sharing really can help because,um, it's actually a big problem for, for, for this, uh,old school industry in a way to,to aggregate all of this information and as well as in a trusted way, access it. Food and beverage traceability is also, in our case,I'll show you something today regarding that,but basically we've been very active in the, in this industry, uh,following things from, from whiskey to, to perishables and dairy products,and, um, since a couple of years ago in the pharmaceuticals as well. So, uh,systems built on top of Origin Trail have been used to track vaccines andmedicines all the way from US to India, and actually, uh,make sure that they really reach the patients rather than end up on some form ofa black market. They like to call it the diversion, but I,I like to simplify it and say it's stealing.

So the technology helps make sure that this doesn't happen. And these are just some of the cases there,there are other applications as well. You can, like I said, check it out on,on the website, um, origin Trail io. So, um,I'm gonna try and explain how that works. Actually,the core of Origin Trail are these things called knowledge assets.

Uh,like I said, knowledge is a new asset class, right?So I'll try to explain on a high level, how does these,these knowledge assets work? What, what are they, what can you do with 'em? So,I'll try to explain with a few slides,and I'll show very practically how one looks like knowledge assets are thesecontainers for knowledge, they're basically, um, verifiable, um,boxes for knowledge if you want, uh, like that. And what you can do with,with them is you can create and own them. So basically, um, when I were,if I were to create a knowledge asset right now,I could take some knowledge from a company or personal knowledge,let's say from a, from a device, like a smart watch or whatever,and I could structure it in, in a knowledge, um, uh, knowledge graph way. I will show that in a minute,how that works and create this knowledge asset on origin trail,decentralized knowledge graph. When I do that, um,an N F T actually gets created,and this N F T is there to represent this knowledge asset.

So in a way,it is the container for this knowledge asset, uh, meaning that, uh,it signifies that I'm the owner of this knowledge as well as it enables me tomodify this knowledge. So the o only the person who has this N F T is able to update, um,this knowledge asset over time. And on this, uh, brief animation here,you can see actually, uh, uh,a knowledge asset created with an E RRC seven one NFT attached. This is not a monkey picture. N F T,the point of the NFT is not to like look cool or make you be part of somecommunity, rather it's a technical component,and it's used literally for administration of this knowledge asset.

So,uh, it's, it's not, you know, you wouldn't normally, uh, I don't know,see a person running around with a shirt of it, uh,but you might expose this N F T to any type of knowledge marketplace,N F T marketplace,where you could literally create very rich knowledge assets and then immediatelyutilize all of this blockchain, um, web three infrastructure because of the,the standards and, and the wallets and everything that enables it. Um,once you create, uh, and, and own your knowledge asset, uh, it,it becomes discoverable in the decentralized modern graph, basically gets this,uh, unique asset, uh, locator, um,which is a successor of the U R L, the Uniform Research Locator, uh,which enables you to really load this, uh, knowledge asset. And this animation briefly shows it, I'll show it again later, but essentially,whoever has a u l can access this knowledge asset and can, can, uh,see all of its contents. And actually, apart from seeing all of the contents,what you can also verify, um, the, the, uh,basically the origin and the trail of information,trail of this knowledge asset. Um,and basically what that does is it enables you to see who was the issuer ofthis knowledge, uh, which is really a set of statements.

And then you can also verify the integrity of the information that you read sothat the interface itself trust is, like I said,a big component component of this. So the interface,the website itself that you might be seeing this through or some mobile appcannot even trick you. So there's no way that in the transmission from the original sourceto whoever's querying this knowledge asset, there's no way, um, to, um,modify information, do some sort of man in the middle attack, uh,because the, the, uh, ending, um, uh, the,whoever's querying this at the end is able to actually verify the integrity onthe blockchain. And I'll explain that, uh, in, in, um, in a very, uh, short, uh,um, uh, next one of the next slides. But essentially the point being is that both the origin, the source,and the trail of updates, um,including the creation is something you can see on the blockchain,has temper proof blockchain proofs.

Finally, when you have all of this,you can use this knowledge, uh,these knowledge assets to build trusted AI systems. Um,and that meaning whenever your AI system queries the, the,the knowledge assets, you can see the source,you can verify the integrity of the trail,and you can make all kinds of interesting queries on it, um,because you have this, uh, assembly of trusted knowledge assets essentially. And I'll show that today. So, um,if this wasn't 100% clear or deep enough,that's for a reason. Trust me, I've been giving this presentation for a while,and because it combines blockchain, AI and knowledge graphs,it can get a bit confusing.

So I'll try to reiterate some things over the course of, of the,of the presentation. Um,I see a few questions where I can quickly maybe answer them. There's a question,which chain are the NFTs created on? It's a very good question. Um, and,uh, I'll quickly answer it and then I'll add some more context later. Origin Trail is actually a decentralized launch graph.

It's, uh,it's not a blockchain network per se, rather it's a multi chain, uh,decentralized launch graph connects to different blockchains. And, um, it,when it was launched, it was launched on Ethereum 2008, uh, 2018, apologies. And then we expanded the network to a couple of other blockchains,such as Polygon knows this. Uh, and since recently, a lot of year ago, um,uh, also on the polka dot power chain, which was custom launched for Origin 12. So it's called Origin 12 Power Chain.

However, um, long story short,the point of this technology is that it's neutral so that it lets essentiallyeach blockchain community who wants to integrate this technology to be able tointegrate it. So the answer is whatever blockchain community supports,the know, the decentralized knowledge graph, uh,is where you can basically mint your N F T. So when you create a knowledge asset, you pick, you say,I want this knowledge asset,N f t to be minted on Ethereum or hypnosis or on any other chain. So that's like the quick answer,and I'll explain a a bit later how it actually works in more detail. Um,another question is, will using blockchain reduce the bandwidth for data access?That's also a great question.

Um, and the short answer is no. So the bandwidth, bandwidth for access is not constrained with the blockchain. Um, however, the, the bandwidth for publishing,introducing new knowledge in the, in the know these time ized knowledge we have,obviously is. Um, and then there's two types of bandwidth we're looking at here. One is, let's say the frequency.

How many knowledge assets can be created over time?And the other one is how big these knowledge assets can be. So I'll save that this explanation for a bit laterfor the right slides. So maybe the question can remain open so I don't forget. Uh, but essentially, um, on half of the,the answer meaning the bandwidth for access is actually not affected, um, uh,because it doesn't go through the blockchain. I'll show you that in a minute.

So, okay, I'll, I'll basically explain this in a bitbit more detail because it can get really complex. What you can visualize is essentially a three layered architecture. Um,this layer one being these multiple blockchains that I mentioned,or we call this multi chain consensus layer,origin trail layer two would be this decentralized knowledge graph. It's its own network. Uh, it's one network.

Um,if you've ever had contact with web three,you can sort of put it in a similar bucket as somethinglike I P F Ss, which is, um, a data storage network, um,which is also peer-to-peer decentralized network. However,it doesn't have consensus, so it doesn't have the, the, the,the throughput limitations, essentially nodes in the IP F Ss network,also in the origin trail, decentralized large network. They don't need to reach consensus because data is essentially replicated. Uh,the difference between I P F SS and, um, and Origin Trail though,is that I P F SS is really for data storage. While Origin Trail is more like your knowledge graph, like a database, right?So you can quick send different types of queries.

It has more structure, uh,adds, um, obviously knowledge graph features to it. Uh,however many different nodes currently around 155 nodes running the latestversion of Origin Trail version six, are dispersed around the world. Anybody can run a node. So whoever's on this call today can just like,go to our website, download the node,run it on their computer or cloud environment, wherever you can tune in. And, uh, after you set up your node, your node becomes part of the network,and it starts hosting knowledge assets.

Um, so, um,actually gets compensated for hosting these knowledge assets. So if I were to create a knowledge asset now, um,just as using a blockchain for,let's say if I want to send a transaction to somebody, I have to pay a fee. Um,in the same way, if I want to create a knowledge asset, I have to pay a fee. And the fee is,doesn't go to like any central body like me or our company, no,it actually goes to the nodes that are hosting this, this knowledge graph. So all of the knowledge assets essentially get replicated, uh,according to certain, um, um, essentially, um,D H t like principles so that the knowledge is, um, content addressed, um,on the network as well as replicated so that there's, uh, uh, very,very, uh, minimized, uh, a chance of, of failure.

And all of the knowledge assets are accompanied by certain set of tokens,uh,which the nodes who are hosting them actually get after they finish hostingthem. Um, so it's, it's kind of a, a pay as you go system. Um, finally, on top of this two layer infrastructure,we see knowledge asset applications. So applications such as the one I'll show today that basically use theseknowledge assets that verify this knowledge, create it, um, connected and,and query it and so on. So, like I said, to recap,this is a multi chain decentralized model graph.

It's actually based on,on standards such as W three C, decentralized identifiers,the credentials data model, E R C standards from the blockchain world,such as seven to one or NFTs GS one standards,which are actually very relevant for supply chain,such as the cases I mentioned before. Um,these uniform asset hol that I mentioned are basically standardized identifiersfor knowledge assets. And when, um,when I touch upon origin to repair chain, I, I actually, uh,forgot to highlight one more thing, which is that, um, soon, uh,and that's one of the reasons why it's there soon,it'll start running incentives for knowledge creation. That means literally you come,you create some knowledge and you get some tokens in return. So the O T P origin,trail ing token is designed essentially to incentivize knowledge creation andrunning this infrastructure, um, much as, for example,Bitcoin is designed to incentivize infrastructure runners,miners in Bitcoin and so on.

Um,there's a lot of use cases because this is very, um,widely applicable technology like knowledge graphs. You can do all kinds of things,re commander systems search and discovery engines, QM systems, uh,but also something that doesn't normally associate with knowledge, uh,knowledge graphs, which are usually centralized, um,such as knowledge marketplaces, that's one of the, the cooler things as well. Knowledge aggregation, multi-party verification of different, uh,statements and so on. So that's, I guess,and the shortest possible explanation, um,of the three layers and how they connect. Um, having that said,how does one knowledge asset look like?So this is like an anatomy of a knowledge asset.

We have knowledge,obviously at the top, the knowledge is contains some, some knowledge. And now that particularly means r D F structured knowledge in this case,because it's a knowledge rev world R D F, uh,can be produced easily from JSON ld. If you ever ran into JSON data structure, you probably,you probably have if you're a builder, but JSON LD is like an extension to that. It makes this JSON object, uh, really a graph object. And here you can see an example of, um, of an entity in, um,in a knowledge asset that describes an event.

And actually,this is the conference we're organizing a month from now. Uh,you're all invited by the way. It's also gonna be online and, and offline. Um,and this event has, um,basically a certain identifier and has certain connected, um,objects to it, such as a location which also has a type and so on. So this extension of context type ID and so on.

This is basically kind of the extension on top of JS that allows json likesimple key value structures to become actually graph based structures. And then it can be converted in this R D F. So think of creating a knowledge asset as creating such a JS LD object. And then you can run this command with basically dkg create,and it would produce, it would replicate this knowledge across the network. It would produce this uniform asset locator, which looks like this.

Did D K G something actually,this did means decentralized identifier contains a bunch of things,not to go into the depths,but what's really cool here is that it's self de referenceable. So you don't need like a d n s another centralized system somewhere. Rather,your client can just really read this, uh, your browser and can understand, aha,this is the blockchain. I need to go check the record on this is the contract in the blockchain. This is the actual graph, um, graph entity that I'm looking for.

So there's no,uh, no way or no reason for any intermediary or some sort of, uh,centralized index for that, that, that actually this D K G index at the bottom,uh, is basically implemented in this way. So this ul you to query this knowledge,but also to find this knowledge N F T and to get these knowledge stateproofs a state proof. If you're a developer,you probably use Git and GitHub. Uh, let's say when you make a git commit,you get a commit hash, right? Well, that's a very similar thing here. So this is essentially the hash of the knowledge above.

However, um,it's not just a simple hash, it's actually a Merkel hash,which allows us to do some very cool things. Again, a topic on its own,but essentially, long story short, um,for specifically from the blockchain side of things, as you can see,the pink parts are what is stored on the blockchain. So the blockchain hosts only the N F T and only the small hashes. So that means we have used the blockchain in sort of a minimal way thatthat means transactions are pretty cheap. We're not trying to put a lot of data on the blockchain,which a lot of people try to do from our experience is not a good idea.

And there's some very simple reasons for that. Well, it's,it's not designed for it, it's not a database. So even if you put a bunch of data in there or knowledge,you don't have a query engine embedded in blockchains. Um,and it's really expensive. Whatever blockchain you take it,it ends up being prohibitively expensive.

So essentially the origin 12 dkg also solves this,this end of the problem making a decentralized network that is notprohibitively expensive. Um,and it enables that through a couple of different ways,but also uses the properties basically inherits these properties of theblockchain. So for example, I can take an arbitrarily large piece of knowledge,say, you know, 10 megabytes or something,which normally you cannot fit in any blockchain like inone transaction. Uh, usually limits are far less in, in kilobytes and less. Um, but anyhow,I can take a kind of a large graph and I can produce this Merkel proof and onlythis hash.

This thing, which is a 32 byte hash ends up being, um, uh,a value that's on the blockchain. And what's cool about it is then I can take on a piece of this graph,any piece where I can produce some something called a Merkel proof to show thatactually this thing is included in the entire knowledge assets. So these are called inclusion proofs,and you can essentially compute them without any need to trust anybody. You can just trust the math and you can compute that a certain piece of content,for example, this name here is indeed in here, uh,even though you might not even have the entire content. So that's another,another whole nother story, which, uh, I won't go too deep into right now, but,um, happy to answer any questions on that later.

And essentially,these knowledge assets like the, the, the, the four things connected, the N f t,the proofs, the ul and the knowledge, they, they live across these two layers,um, and they enable all kinds of cool things. So, um,one of the cool things,and what I'm gonna show today is really this synergy of symbolic and neural ai. So the idea of having a knowledge graph,which has very highly structured data combined with the idea of havingvery unstructured, um,access to data and approaches such as, you know,getting a bunch of text and vectorizing it, creating embeddings is,is really an interesting one, especially lately in the knowledge graph world,is this connection, um,between the two approaches enables creating very interesting knowledge graphembeddings and solving very interesting knowledge graph problems such as linkprediction or completion of knowledge graphs. Um, again, huge topic,very interesting for research, uh,we believe is going to be one of the next breakthrough areas in,in the world of ai. And, um, having that said,there's gonna be a lot of things that are gonna happen on,on the generative side of things.

However,today we're going to focus on a bit of a different thing. We're going to focus on not so much on regeneration,rather searching through existing trust and knowledge. Um,so I'm going to show an example of knowledge asset application. Um,and, but before that, I see a couple of questions which may expire, uh,in people's minds before I, uh, before I go into,so I'm actually gonna make a short break to answer these questions. Question is,do node operators get paid in t p or trace? Ha, okay,what's more explained on the Origin 12 website? But briefly,the two layers that you've seen,there's a blockchain layer and knowledge graph layer.

The knowledge graph layer nodes are compensated in trace track tokens,really, uh, for knowledge publishing,and that's the utility token of this layer. Um, however,on the blockchain layer, uh, the, um,the utility is blockchain transactions, right? So whatever blockchain you use,you have to use the token of that, um, blockchain in the origin power chain. That token is O T P. However, if you're publishing valuable knowledge,and that is something that the community decides, by the way, um,you actually get O T P as a knowledge publisher back,so you spend trace, but you get O T P. It's a little, um,little complicated, but there's, there's an open R F C right now,an origin trail community R F C, uh, 18, which, uh, explains this.

Um,and it's available on our GitHub. So, uh, super happy to,to share the link later. Um,how do publish, how does publishers are value validated for gender? Okay, uh,I'll understand that question is how do we know the knowledge somebody publishesinto the graph is really true or not? And the short answer is we don't. The,the short answer is essentially anybody can publish anything so such ason the internet. Like the internet is a free environment where anybody can just put up anything,right? So it doesn't have to be true.

And that is from an infrastructural perspective,if I go back a couple of slides, like somewhere here,so if you look at this super simple layer infrastructure,basically the colored part is the infrastructure. The infrastructure, uh,of origin fail is designed to be neutral. Uh,there's a certain set of principles such as neutrality, inclusiveness,and usability that we have always used to guide the development of the entireecosystem. And the big part of this neutrality is infrastructure. We believe it needs to be neutral.

That means the infrastructure should not be able to decide if something is trueor not. Essentially,today we don't have an algorithm that you input something intoand it can tell you this is true or not, right? That doesn't exist yet. However, we are converging to something of that sort. And something of that sort is definitely, uh,definitely needs what these, uh, infrastructure provides. So it provides the ability to verify, uh,a source who publish something that indeed they publish that in certain form.

That's why these blockchain proofs. But then also it provides the ability to connect different statements that otherpeople have published. So a lot of people can publish a lot things,and in a very kind of simplified way,Noje asset applications get to decide how to make sense out of it and tounderstand if something is valid or not. So for example, we can take, uh,let's say multi-party verification system. Imagine that there is a certain set of statements that were made by manypartners.

Like one statement says that, I dunno,right now it's daytime in San Francisco. And, um, it's okay,that's something you can easily check if you look out your window. Uh, but, um,also you could sort of think about it as it, um,many different parties can attest to this, uh,creating statements in the launch graph or something. Something else. This, um,is similar to the system of consensus.

Uh, you can see that also in,in blockchain Oracles, for example,systems like Chain Link or generally blockchain consensus,the idea being that there's multiple parties that somehow converge and based onsome rules say, ha, this, this is true. Um, however,such a rule is not embedded in the infrastructure. That's something that can be done on the application level. Uh,point being is that the infrastructure is used to get all of the things somehowtogether, aggregate them, enables you to query them. And then when you get them,um, knowledge, you're able to then assess aha, you know, what,what is the source? Is it Fox News or C n n or whatever Russian TV or,you know, and then based on that source, um, and source reputation,which also is by the way, not part of the infrastructure.

Something we see as part of the application layer,you're then able to discern and, and make a decision. There's a project within,uh, the origin trail ecosystem that is building something,they call it the truth chain, which actually enables, uh,certain primitives for verification of, of content. Um, however, it's,it's an ongoing, uh, area of, of research and building. Um,and then the question, wouldn't that influence possible bias? Um, well, we,we see bias from kind of a different angle. Uh, for example, if somebody's, um,if you're used to, let's say the knowledge graph world, and by the way,knowledge graphs exists within, um, companies such as Netflix.

They use it to recommend you, uh, what you, what, you know,what you should watch, watch based on your history or Google,they invented the knowledge graphs or, uh, Uber uses knowledge graphs. Um,a bunch of companies, um,they basically use it to integrate all kinds of information,but they use it in a centralized way, and they, they govern it. So it's biased,let's say, towards them and their use case and what they believe and so on. Um,origin 12 is different in that source because we make sure thattechnology itself doesn't have integrated, let's say,bias in that sense,so that it goes towards whatever us or builders origin trail rather. Anybody can publish anything, can connect anything to anything else,just like on the internet.

And then it's up to you when you're building applications on top to, to, again,to, to use this, um, in a clever way and to, um, somehow,um, make sense of it. Uh, but the bias itself,what we see from an infrastructural perspective would be if there were some sortof specific, um,let's say filtering techniques or some decisions on the content, um, which,uh, the infrastructure itself would enforce. That's why the infrastructure at all doesn't enforce any content. Um, any,any anything on the content. Rather, it enables you the more, let's say,the more standardized you make things, the more,the more the nicer you structure the knowledge, the more you use standard,the more you know, make it, you make it, uh, um, let's say, uh,queryable, um, the more discoverable, the more people will be able to find.

So,um, essentially it,it only kind of incentivizes structure,but it doesn't incentivize any content and so on. Um, so,so I guess that's kind of the quickest, shortest answer. And because we don't have a lot of time,I'm just gonna keep going to show an example. And like I said,I'm gonna show an example applications,there's a lot of different applications that can be built on Origin Trail. This is just one, I encourage you to go see more on the website,but this one is going to be about extractive question answering based on Millsand Origin Care.

And, um, I actually have, um,a quick, uh, diagram and now I'll go into the app. So on this diagram going,we have like, let's say four entities, five really, but, um,logically there's four of them. We have a trusted AI application on the,on the left, which uses origin, trail, knowledge, assets, origin, trail, dkg. Um,somehow this is able to find knowledge and knowledge that was obviously createdfrom some knowledge source previously. So if you remember the slides at the beginning, I said,we can create knowledge assets and own them, then we can discover them,we can verify them and use them.

So we we're gonna go through the same process here. Imagine there's a knowledge source,and I'll show you one of those knowledge source. You can create knowledge assets from it using, for example,running a knowledge in trail node. You can do that. Um,and the cool thing you can do is then connect it with vus.

So you can basically create embeddings from this knowledge and populate, uh, uh,a VUSs, uh, vector db. Um,finally publishing these knowledge assets into the, the knowledge graph. So once you've basically created and nicely structured and vectorized thisknowledge, publish it into origin 12, D K G, and you make it discoverable. So a trusted AI application such as let's say QA system can discover thisknowledge, as you can see in this, in this later step. And then, uh,one of the things that they can do, for example, which we'll we'll show today,is, um,imagine the user types in a question or pronounces a question or whateversomehow comes up with a question, um, looking to find a trust and answer.

You can normally do that with like, just, just VUSs. You could do some sort of,uh, ba vector based search, uh, over some knowledge base, uh,or a set of indexed information in, in, uh, VUSs db,and it would return, uh, vectorized, uh,basically vector search results based on similarity, right? Um,which is really great. And, um, the great thing, uh,to extend it as well is the knowledge we have quicks. So for example, we,instead of just answering the questions, uh, directly by similarity search,which which can also do some really great things,we can also determine the intent of a question by finding vectorsimilarity to, to, let's say, other questions or content that's available. So in this application, I'll show you, we've done that exactly.

So basically we determine intent, and when we determine intent, we know, aha,this is actually fitting certain set of module queries or one query. And in, in this particular case, it would run a query directly on the knowledge. So it's like a database query in our case,because it's using semantic web standards. This is a sparkle query,uh,and combines that with the result of similarity search to finally produce a setof results. Um, and all of these results are again,extracted from this combination of technologies, origin, trail, node,and s which we like to call a knowledge bank.

There's a few other details in there, but for simplicity,these are the two main components. Um, and then, um,extracting this information, uh, can,you can also verify the source and data integrity on the blockchain. And I'm actually gonna show you that very quickly in, uh, uh, um,another browser window. So just a second somewhere here. Yes.

Okay, so refresh that. So,okay, so this is an example implication, and I'm gonna show you this part. So obviously I cannot show the whole flow. So we already created knowledge assets, and I'm going to show you, for example,um, I have two examples here. Um, so an example in one of the AgriFood,um, cases we're working on, so imagine Simple chat bot,but has kind of two modes.

One mode is to extract information to basically just do these queries,or there's a kind of another one which extracts and summarizes using an l l m. Uh, so I'll show you both, but essentially, um, I'm gonna ask a simple question. Uh, I just need to give you a context for this case. So what this,this particular demo is designed for is, as you've seen,I haven't shown the discovery step. Um,imagine you went into a store and you bought a piece of poultry,um, basically like, I don't know, chicken legs, something like that.

Um,uh,I'm saying this because this is actually a real case that we're working with in,in Europe. Um, imagine you scanned certain codes to get to this, this,uh, U R L. Uh, this would by the way, be also standardized, uh, the,the new version of the barcode called, called GSS one Digital Link. Uh,but anyhow, uh, long story short, imagine you somehow got this product,you scan it and you open it up on your phone and you're, look,you're basically now chatting with trusted knowledge about that product. And then brand, brand.

So I can, for example, ask something like,um, where does this, uh,chicken come from? Uh, or which farm, which farm?Probably better, which farm does this chicken come from?And if I click search,what happens under the hood is actually what I showed you here. So I'll try to maybe somehow move it a bit. So this site, basically,it's now determining pent by contacting, uh, the, the Zeiss, uh,uh, platform, which provides hosted mill instances. And essentially it's taking a bit, I'm not sure why it's so long, uh, but ah,here it's, and after that it actually runs a knowledge graph query. So the result you see here, it's not the prettiest,it's not like you're a nice Chad G p T,but this is just an example which actually show shows these triplesfrom the knowledge graph.

So it says, this is the farm id,this is the product id. Um,the owner is Stanis vol because this is actual an actual farm from Slovenia. Um, and shows the certificate, u r l a couple of things. But like this,this is basically the content that was found based on, uh,having, uh, very high certainty on the intent of the question. And then running,uh, a standardized graph query to find the farm.

Um,basically also showing the, the key component,which is the source knowledge asset with the u l an owner. If I open this source knowledge asset,we're actually going to able be able to explore it. So this is actually just now read from Origin 12,decentralized large graph by Zoom in, you'll see the,the farm has this type in place, like I said, uh,in Jason ld, Stanis Lavo, um,with the address and a bunch of other content, including like a photo,for example. I guess if I will open this link or, uh,I need to save image. Oh, okay.

Oh no, here it's,so this is the photo of the farm. Um, there's,and there's a bunch of other connected knowledge as to it. For example, um,if I click here on these symbols,the graph starts growing and it opens up additional knowledge assets. Um,and somehow I,I messed something up here on the demo. Um,so I'll just open up the original knowledge asset again.

Uh, ah, here it is,and I'll show you. Uh, actually the other part. So this is the N F T that was minted for this knowledge asset. And you can see it was in minted 50 days ago. Um, and it, you know,the owner and so on.

Um, and this is by the way, on the testnet. So as per perhaps, uh, a few glitches here and there as a demo,but you can also see that there was this knowledge asset was, uh, basically, um,updated a certain amount of times. And for each of these updates,you can see who is vsu and this and the state hash of this particular update. Um,you can also see all of the content in terms of these graph resources in here. So, uh, you can explore them, um, through this interface.

Now,this interface is a generic interface, which is very similar to, uh,blockchain explorer. If you go to blockchain like this,this is a blockchain explorer,you can see all of the transactions on this blockchain. For example,for this one, we can see the minting transaction for the N F T, um,and it shows, um, how many tokens were spent and so on. Um,very similar way. This is the, the DG Explorer,and you can basically see all kinds of, um, interesting knowledge as is there.

I can also run a, a for example, a different question. I could say. Um,let's say, um, this, um, uh, uh,uh, does this, uh, let's say,what are the characteristics of this meet, for example, what are the,and if I do that, there's also another,basically this one was quicker. Uh, essentially it went and,and took a bunch of graph results. It, uh, it,it found the product description, the product, Q R L, the image and so on.

Um, or if I were to do the same, if I were to send this now to an L L M,obviously the L l M would make this look nicer, so I can do that. I can just say, um, again, what are the characters submitting this mean?But I switched from extract to extract and summarize mode. And now, uh,there's an L l m Well, however,this l l m has much less chance of hallucinating again,because we basically gave it all the content. And the prompt essentially says, take this and structure it a little nicer for,for humans. So it basically gives you kind of a nicer,nicer representation of the same, uh, of the same knowledge that was extracted.

Can also open up these knowledge assets, and we can look, uh, into all of them. So there's, there's a bunch of different ones. Um, and in, in that sense, um,there's, so this is one like example for, for, um, for, um,for the, the food industry. I can also show you this, uh, medicine example,which for example, and I can even do that with audio. We just recently tried Google's ai, uh, um,model tool really called chirp.

So I can try and record it. Um,if I have a second. Let's see. Um, let's say with this,let's say I scan the medicine. Now I can ask it.

Can I,let's say take this medicine with alcohol. Sure, let's do that. Can I take this medicine with alcohol? Uh,this interface is not the nicest, it's a demo, but I basically,I got the question. So there was some basically N O P involved,and then the result,what actually was basically just a simple semantic search on the knowledgeasset, uh, which found that, uh,taking your maker with alcohol is something you should not do, obviously. So if I were to open this knowledge asset,I would see actually quite a rich knowledge asset with a lot of information thatwas produced from the leaflet of this medicine.

Um,so that would be a very, let's say, simple application that, um,and I'm gonna go back to the chart here that usedknowledge assets created from certain sources, which you can see who they are. Basically, you can see the identifiers on the blockchain,basically wallets that publish this. Um,and because they were nicely structured in the knowledge bank,they could be discovered and then used through such a question answeringsystem. Um, and one of the big reasons why, uh,T K G is so special is, is because designed not to be a system where, um,you know, everybody uses their own centralized entity,rather it's a decentralized knowledge graph. So many different people can collectively, uh, basically populate it.

And,uh, with populating,they can also connect different knowledge assets to each other, building such,uh, a large trusted knowledge base together. Um,and with this, um,therefore this incentivization mechanism is what is really interesting, and,and we're, we're excited to see it kick off in, um,in the near future so that basically we can get some really,really high quality trusted knowledge. Basically, the biggest, uh,knowledge graph in the world is what Origin Trail is aimed to, to become. Um,also, in order to do this, we have launched, uh,a grant program of 1 million track tokens within some,something we called chat, D K G. So, uh, pun intended,of course, this program chat, D K G, is really, uh, uh,a builders program for building trusted AI tools, uh,for, uh, basically for to fight misinformation, uh,and based on knowledge assets.

Um, if you were to scan this cure code,it would lead you to the website, Chad and g ai,which is really a repo where you can find all the details about this program,including how you can apply for grants and which type of applications qualify. For example, question answering systems similar to the, the one I just showed,um, which are open source and which, uh,you would build together with the community, uh,is something that qualifies for, for such scratch. So if you're a builder,head over to the website, uh, we'd be super happy to meet you, uh, also, uh, to,to meet you in our discord. So, uh, having all that said,and since we're almost at the end of time,I'm going to close it here with one final quote from the late founder of Intel,Robert Noyce, who said, knowledge is power,but knowledge share is power multiplied. And that's the motto we go by at Origin Drill.

Uh, thanks everybody. I see a bunch of questions, uh, but um, before we jump into them, I'll, uh,quickly, uh, head over to you, Eugene, um, to, um,to continue I guess with this question answering, uh, or, uh, just, uh,give maybe your thoughts, uh, on the topic. How much time do we have left,by the way? Is it eight minutes? Yeah,We've got like eight minutes left. Let's just, um, let's,let's jump into the questions. Um, there were quite a few questions,so I think we can, um, just, just stress the questions first,and if there's time at the end, I can give some thoughts there.

Um, awesome. Okay. So the, you can start with the, the first question if you would like. Sure, sure. Yeah, I'll, I'll go ahead.

Um,maybe hallucination is part of the ambiguity of natural language,that's the reason that humans invented meth. How are we going to resolve this dichotomy between N L P and accuracy? Oh,actually, uh, the way I understand this question,I actually really like it because, um, you said something really interesting,um, ambiguity and math. Um,in the world of knowledge graphs, uh,math is equivalent to essentially this wayof,or presenting knowledge math equipped technologies such as R D Fand something called ontologies. So for example,the math of the knowledge graph is really the,are these ontologies where you define a certain knowledge domain. Um,and that is really cool because if you divide,define a certain knowledge domain,it's very similar to essentially mathematical logic,and therefore you can do inferencing on the large graph level.

So, um,for example, you can do some very simple inferencing from, um, um,the capabilities of, of graph connections to be exploited, such as, you know,um, inferencing that a certain, that somebody's a relative for. And what usually this is, uh, used for with knowledge graphs, by the way,is often fraud detection. So, um,there's ways to detect similarities in graph structure based on thisinferencing. It's a very, very wide topic, but essentially it's,it's a really good grounding, um, tool or, uh,ambiguity that we can get with N L P and, and generally accuracy. So I,I encourage looking into knowledge graph, uh,inferencing and ontologies to, to go deeper into this topic.

Um,again,these are just tools to help with this so that they don't solve the entireworld. Um, okay, I'm moving on to the next one. I missed the step where the user of an lmm is justified. You can already query your knowledge base exactly without errors using sparkleby putting vectors of your ontology internal inquiry. Are you just,are you just getting a human language query interface instead of sparkle to anextent? Uh, so, um,you are adding a much easier way to interact with the knowledge and theknowledge graph.

And, um,essentially if I were to go back on this chart, you won't see D L M. Um, of course there were, uh, there was a population of embeddings within the,um,creation step to enable easier search and vector similarity search is acomponent of, of this example application, as I said. Um,and that makes it easier to query with natural language,which is really important because not a lot of people know sparkle. And, um,I believe that one of the big reasons why this whole great idea of semantic web,which is embodied today in knowledge graphs,is exactly because it was too complex. Like Tim, when he came up with it,which was 20 plus years ago, it was great, it was an amazing idea, but it,it became really complex.

And one of the, the reason is, you know,this usability. So that's why usability is a big principle behind Origin Trail. It has to be easy to use these technologies and, uh,this is one main way to make it much easier. Um,but yet again, you're not querying an LM to, you know,generate an answer to you. You're really just vectorizing a question at best.

Um, and then ideally you are somehow,you're not just using a very basic vector similarity search,you're somehow trying to figure out the intent of the question and then do avery strict, I may say, knowledge graph query. Um, okay,one more question, and if so,how do you mitigate the errors because of the approximate nature of vectorizing?It's a very good question. It, it relates to what I just, uh, said briefly. So the idea would be that instead of, um, observing this,this is a system, that's why we called it actually Trusted Semantic search app. Instead of thinking of it as question has one, right Answer,touch systems that provide you with, um, touch results,um, have a bit less of a promise to give.

So they don't say like,I'm telling you the truth, rather, rather, it says,I'm giving you a set of result from, from a database or a knowledge graph that,um, you can verify the, the sources, the integrity and all this information,but also you can, you can then decide what you do with it. So, um,ultimately, um, the,the end application is responsible for somehow handling all of this informationexactly because of that topic. We mentioned the, the bias in the beginning. So,um, so long story short, minimizing the use of lms, uh,maximizing the use of, of knowledge graph queries,and then also not sort of making the final decision for whoever's makingthe query rather than enabling, um,sort of discovering this information through kind of a search result, uh,system. So that will be the shortest possible answer with,with a little time we have.

Uh, but thanks everybody for the great questions. This was, um, totally on point and on spot. All right, that was a great, uh, last question to end up on. And we also happen to be running up almost right on the edge of the hour,so we only have a couple minutes left. And let's, uh,wrap it up here then.

Um, thank you everybody for coming and,uh, thank you Brian Amir, for a, uh, great presentation. Thanks a lot for the invite and thanks everybody. I hope you could follow, um,feel free to jump into,to our Discord if you wanna continue the discussion or reach out to me onTwitter, LinkedIn, telegram, or whatever. I'm, I'm everywhere. So,um, thanks again everybody.

Have a great day. The rest of the week.

Meet the Speaker

Join the session for live Q&A with the speaker

Branimir Rakic
Founder and CTO, Trace Labs
As the founder and CTO of Trace Labs (core developers of OriginTrail), Branimir has been the architect and creator of the OriginTrail Decentralized Knowledge Graph (DKG), designed for making AI-grade knowledge assets discoverable and verifiable across the web. Branimir has been particularly focused on networking problems throughout his career, spanning decentralized networks (apart from DKG he also has vast experience in Ethereum and Polkadot ecosystems), semantic networks and even mobile networks. Being a strong advocate of open source and permissionless systems, Branimir is active in standards development working groups within GS1 and various blockchain communities, a member of the Serbian Entrepreneurs collective and an alumni H-Farm accelerator.

Build hallucination-free semantic search applications with OriginTrail and Milvus

About the Session

You'll learn:

Meet the Speaker

AI Assistant