You’re in!
Training
What Makes "Deep Research"? A Dive into AI Agents
Resources
WEBVTT
1 00:00:03.565 --> 00:00:05.555 Today I'm pleased to introduce today's session,
2 00:00:05.825 --> 00:00:08.955 what makes Deep research a dive into AI agents
3 00:00:09.175 --> 00:00:10.875 and our guest speaker Stefan Webb.
4 00:00:11.415 --> 00:00:15.275 Stefan is a developer advocate at VIIs, where he advocates
5 00:00:15.275 --> 00:00:17.355 for the open source vector database, no list.
6 00:00:17.805 --> 00:00:20.475 Prior to this, he spent three years in the industry
7 00:00:20.475 --> 00:00:23.275 as an applied ML researcher at Twitter
8 00:00:23.335 --> 00:00:25.715 and meta collaborating with product teams
9 00:00:25.775 --> 00:00:27.755 to tackle their most complex challenges.
10 00:00:28.305 --> 00:00:31.195 Stephan holds a PhD from the University of Oxford,
11 00:00:31.455 --> 00:00:33.035 and he has published papers
12 00:00:33.255 --> 00:00:36.835 and leading, um, leading machine learning conferences such
13 00:00:36.835 --> 00:00:39.195 as nres, ICLR, and ICML.
14 00:00:39.655 --> 00:00:41.635 He is passionate about generative ai
15 00:00:41.815 --> 00:00:44.435 and it's eager to leverage his deep technical expertise
16 00:00:44.495 --> 00:00:46.435 to contribute to the open source community.
17 00:00:46.775 --> 00:00:47.915 Uh, welcome, Stefan.
18 00:00:48.455 --> 00:00:50.755 Thanks so much, Sachi. Thanks for the kind introduction.
19 00:00:51.135 --> 00:00:53.955 And you're right, I, I'm very passionate about generative AI
20 00:00:54.455 --> 00:00:56.835 and also passionate about helping developers.
21 00:00:57.455 --> 00:01:02.155 So, uh, really love, uh, doing webinars like this, uh,
22 00:01:02.155 --> 00:01:03.395 meeting some of our users
23 00:01:03.655 --> 00:01:06.035 and, you know, people just interested in, uh,
24 00:01:06.035 --> 00:01:07.515 their databases and ve ai.
25 00:01:08.615 --> 00:01:10.435 So, um, uh, just like a tiny bit more
26 00:01:10.435 --> 00:01:11.635 about myself before I get started.
27 00:01:12.255 --> 00:01:17.075 So I am what's called the, uh, developer advocate for zille,
28 00:01:17.575 --> 00:01:19.795 the company behind the leading open source
29 00:01:20.335 --> 00:01:22.355 vector database, uh, viss.
30 00:01:23.055 --> 00:01:26.155 And so as a developer advocate, a service like a,
31 00:01:26.155 --> 00:01:30.475 like a bridge between developers, the, um, the, the users
32 00:01:30.815 --> 00:01:33.915 of viss and the, um, the, um, the developers.
33 00:01:33.935 --> 00:01:38.515 So providing technical support to users, um,
34 00:01:38.585 --> 00:01:43.515 helping connect users with, um, uh, engineers for, um,
35 00:01:43.615 --> 00:01:44.955 you know, deeper technical support.
36 00:01:45.705 --> 00:01:48.755 Also running a lot of, um, uh, events
37 00:01:49.385 --> 00:01:50.955 like, uh, these webinars.
38 00:01:51.095 --> 00:01:55.195 We do a monthly, uh, meetup in the Bay Area, um,
39 00:01:55.855 --> 00:01:58.435 and, um, you know, producing some, like,
40 00:01:58.665 --> 00:02:00.075 some written content as well.
41 00:02:01.495 --> 00:02:03.915 So, um, I've put my LinkedIn there.
42 00:02:04.135 --> 00:02:07.155 Uh, I always love connecting with, with, with, uh,
43 00:02:07.225 --> 00:02:08.515 with, um, new folks.
44 00:02:09.315 --> 00:02:10.875 I love hearing like what, what you're building
45 00:02:10.875 --> 00:02:13.795 with generative ai, hearing, like what your sort
46 00:02:13.795 --> 00:02:14.875 of like challenges are
47 00:02:15.015 --> 00:02:18.235 and what your, your, um, your, uh, your visions are.
48 00:02:18.895 --> 00:02:20.235 That's like my, my bread and butter.
49 00:02:20.455 --> 00:02:23.395 So, uh, please, uh, connect with me on LinkedIn.
50 00:02:23.475 --> 00:02:24.835 I would love to, to hear from you.
51 00:02:25.015 --> 00:02:27.355 And, um, you know, maybe like if,
52 00:02:27.735 --> 00:02:28.955 if you're building like a rag
53 00:02:29.055 --> 00:02:31.875 or an agent system with your, your startup
54 00:02:31.935 --> 00:02:35.035 or your company, I think there's a really good opportunity
55 00:02:35.215 --> 00:02:39.515 for, um, for, um, a developer advocate at a company like Zel
56 00:02:39.695 --> 00:02:40.755 to, to sort of help
57 00:02:40.815 --> 00:02:45.395 and, um, provide some, provide some, um, some consultation.
58 00:02:46.175 --> 00:02:50.195 So with that, let's get started with the, the webinar.
59 00:02:51.255 --> 00:02:55.555 So the, the topic for today is what makes deep research,
60 00:02:56.375 --> 00:02:59.075 and I've subtitled it, I dive into AI agents.
61 00:02:59.745 --> 00:03:02.565 So we're gonna be talking about research agents
62 00:03:03.205 --> 00:03:04.725 specifically, uh,
63 00:03:04.745 --> 00:03:06.845 but I think a lot of this sort of also relates
64 00:03:07.465 --> 00:03:09.925 to generative AI agents in general.
65 00:03:12.065 --> 00:03:15.165 So I will start off, I'll, um, just to like, give a tiny bit
66 00:03:15.165 --> 00:03:19.845 of background to, um, open AI's deep research release,
67 00:03:20.785 --> 00:03:24.485 and then I'm going to introduce a, a research agent
68 00:03:24.755 --> 00:03:29.365 that is open source inspired by that, um, produced by
69 00:03:30.085 --> 00:03:32.685 engineers at Zillows and fully open sourced.
70 00:03:33.745 --> 00:03:37.285 So, um, uh, I, I say demo, it's more of like a,
71 00:03:37.285 --> 00:03:38.405 like a code walkthrough.
72 00:03:38.875 --> 00:03:40.965 I'll sort of like explain how it was put together.
73 00:03:42.475 --> 00:03:45.845 Then after that I'll talk a bit about some of the ideas
74 00:03:46.505 --> 00:03:50.765 behind, um, agents in general, uh, but also research agents
75 00:03:51.305 --> 00:03:52.685 and what's kind of like new
76 00:03:52.865 --> 00:03:56.005 and why is deep research sort
77 00:03:56.005 --> 00:03:58.485 of come on the scene, uh, so recently.
78 00:03:59.465 --> 00:04:02.325 And then with, with, um, you know, with that discussion,
79 00:04:02.865 --> 00:04:05.125 it should be like clear, like what some of the,
80 00:04:05.125 --> 00:04:09.645 the challenges and, uh, yeah,
81 00:04:09.805 --> 00:04:11.765 I guess like, like challenges and obstacles to,
82 00:04:11.945 --> 00:04:13.165 to wider adoption.
83 00:04:13.345 --> 00:04:15.285 So we'll talk about some of those
84 00:04:15.985 --> 00:04:17.645 and some potential solutions.
85 00:04:17.705 --> 00:04:21.565 So sort of give you a, uh, gives you, gives you a sense of
86 00:04:22.015 --> 00:04:25.125 where things are headed over the short term,
87 00:04:25.195 --> 00:04:27.085 next six months, six, 12 months, et cetera.
88 00:04:27.945 --> 00:04:29.045 So with
89 00:04:29.045 --> 00:04:33.925 that, Let's get started.
90 00:04:34.105 --> 00:04:38.005 And, um, so by the way, uh, feel free to ask questions
91 00:04:38.705 --> 00:04:41.085 in the chat as, as they occur to you.
92 00:04:42.065 --> 00:04:44.165 And, um, I'll just kind of like, I'll, I'll stop
93 00:04:44.365 --> 00:04:46.325 whenever a question comes in, um, or,
94 00:04:46.325 --> 00:04:48.605 or try my best to do so and, and take them as they come in.
95 00:04:49.625 --> 00:04:51.725 So, um, okay.
96 00:04:52.385 --> 00:04:56.125 So, um, uh, I'm sure like everyone here has heard about
97 00:04:56.835 --> 00:05:00.925 open ai, uh, one of their, their new product releases, uh,
98 00:05:00.955 --> 00:05:05.325 deep research, which was released at the very, um,
99 00:05:05.665 --> 00:05:09.365 or, uh, near the start of February last month.
100 00:05:10.465 --> 00:05:13.085 And so this is, um, it's a bit of a different product
101 00:05:13.465 --> 00:05:18.245 to their sort of like straight, um, chatbot in that,
102 00:05:18.865 --> 00:05:23.365 um, uh, so it is able to, to go off, uh,
103 00:05:23.425 --> 00:05:26.925 search the web, do other, use other sort of like tools
104 00:05:27.465 --> 00:05:31.965 to build, um, a really detailed report to your question.
105 00:05:32.905 --> 00:05:33.965 And so you can give it.
106 00:05:34.225 --> 00:05:36.125 So, um, I've just taken a screenshot here.
107 00:05:37.225 --> 00:05:39.205 Um, and so, um, this is an example.
108 00:05:39.865 --> 00:05:44.645 The question in, in this case might have been, um, uh,
109 00:05:44.705 --> 00:05:48.365 please research, um, freestyle snowboards suitable
110 00:05:49.105 --> 00:05:50.685 for an intermediate rider with,
111 00:05:51.225 --> 00:05:55.965 and then the user's given, uh, some details, their height,
112 00:05:55.965 --> 00:05:57.645 their weight, shoe size, et cetera.
113 00:05:58.705 --> 00:06:03.245 So, um, then this, uh, this agent, um, goes off,
114 00:06:03.985 --> 00:06:07.565 uh, uses, so searches the web, um,
115 00:06:08.625 --> 00:06:13.165 and, uh, is able to sort of, uh, work, um, yeah, sort
116 00:06:13.165 --> 00:06:16.125 of like work out how to answer this question.
117 00:06:16.905 --> 00:06:21.725 Um, sort of like iterating from, uh, one step to another.
118 00:06:22.665 --> 00:06:25.485 And then after, um, some time could be like eight minutes,
119 00:06:25.485 --> 00:06:29.565 could be 30 minutes, uh, synthesize like a really detailed
120 00:06:30.145 --> 00:06:32.645 and coherent informed report.
121 00:06:33.745 --> 00:06:36.525 And so, so this is like much different from the sort
122 00:06:36.525 --> 00:06:40.725 of plain old, uh, chat GPT that is just like,
123 00:06:41.945 --> 00:06:44.765 um, you know, like, like returning you an answer more
124 00:06:44.765 --> 00:06:46.965 or less in, in real time, rather than sort of like going off
125 00:06:47.425 --> 00:06:50.325 and, uh, going through like a lot of, uh, autonomous steps.
126 00:06:53.065 --> 00:06:57.045 So, um, I think, excuse me, uh,
127 00:06:57.865 --> 00:07:01.045 why this sort of, I think sort of like exploded in the, um,
128 00:07:01.045 --> 00:07:02.885 the media was because, uh,
129 00:07:02.885 --> 00:07:05.565 people were really impressed by the results.
130 00:07:06.505 --> 00:07:11.365 It seemed to do a very good job of actually researching,
131 00:07:11.785 --> 00:07:15.205 uh, a topic that might require not just like a plain answer,
132 00:07:15.265 --> 00:07:17.085 but might actually require going off,
133 00:07:17.355 --> 00:07:18.805 looking at multiple sources,
134 00:07:21.885 --> 00:07:23.105 Asking further questions.
135 00:07:24.045 --> 00:07:26.945 And I think I read somewhere, um, you know, one sort of, uh,
136 00:07:26.975 --> 00:07:29.905 professor was like, this sort of, you know,
137 00:07:29.905 --> 00:07:32.265 could replace like a, um,
138 00:07:32.535 --> 00:07:35.865 like a early stage PhD student in terms of, uh, doing some,
139 00:07:35.895 --> 00:07:38.145 some research and, um,
140 00:07:38.145 --> 00:07:39.225 other professionals were just
141 00:07:39.225 --> 00:07:40.265 like really impressed with the result.
142 00:07:40.725 --> 00:07:42.825 Um, but what exactly was new about it?
143 00:07:42.895 --> 00:07:45.865 Well, it wasn't the first research agent
144 00:07:46.385 --> 00:07:50.105 released commercially, so Google's, uh, deep research
145 00:07:50.685 --> 00:07:53.985 was released about a month earlier in, in December.
146 00:07:56.045 --> 00:07:58.945 Um, so, um, what exactly was, was new about it?
147 00:07:58.945 --> 00:08:01.945 Like what, why, why did it sort of, um, what was it about it
148 00:08:01.945 --> 00:08:05.745 that had this really much superior, um, output,
149 00:08:06.005 --> 00:08:07.025 uh, qualitatively?
150 00:08:08.485 --> 00:08:11.825 And, um, I think the answer to that is, uh,
151 00:08:12.235 --> 00:08:14.745 we're not really sure because it's, it's closed source.
152 00:08:15.375 --> 00:08:17.625 It's sort of like tightly guarded secret, the design.
153 00:08:18.365 --> 00:08:21.945 But, um, from like, you know, the, uh, like the sort
154 00:08:21.945 --> 00:08:23.065 of rumor mill, it's,
155 00:08:23.385 --> 00:08:25.905 I suppose it's like people speaking to insiders.
156 00:08:26.445 --> 00:08:29.065 Uh, plus also like the, um, the, the, the blog
157 00:08:29.065 --> 00:08:32.625 and announcement that OpenAI released, it seems like a big,
158 00:08:32.825 --> 00:08:36.305 a big sort of, um, element of that was, um,
159 00:08:36.455 --> 00:08:40.425 that it focuses on, uh, like a, uh,
160 00:08:41.405 --> 00:08:43.625 uh, like an end to end, uh, training
161 00:08:43.625 --> 00:08:47.545 with reinforcement learning on really high quality, uh, uh,
162 00:08:47.575 --> 00:08:51.265 reasoning trace data, which we'll discuss more in a minute.
163 00:08:51.765 --> 00:08:53.025 Um, but again,
164 00:08:53.305 --> 00:08:55.025 possibly there's other things in there in the design.
165 00:08:55.625 --> 00:08:57.285 Uh, we just dunno 'cause it's close source,
166 00:08:58.065 --> 00:09:00.765 but we can kind of like guess what they, they are by trying
167 00:09:00.785 --> 00:09:03.285 to like, um, reproduce a system
168 00:09:03.395 --> 00:09:07.725 that can achieve similar results on, on the, um, uh, the,
169 00:09:07.945 --> 00:09:09.405 um, uh, benchmarks that we're using.
170 00:09:10.265 --> 00:09:14.685 And so, so one such, uh, one such model is, uh,
171 00:09:14.995 --> 00:09:17.125 from, uh, from deep seek.
172 00:09:17.505 --> 00:09:19.645 So deep seek r run, uh, R one.
173 00:09:19.645 --> 00:09:21.645 We'll talk about that a bit later on.
174 00:09:25.405 --> 00:09:28.135 Okay. So, uh, what exactly is a research agent
175 00:09:28.555 --> 00:09:33.375 and how does a research agent differ from just like a,
176 00:09:33.595 --> 00:09:36.655 you know, how you, uh, an agent in, in the general sense?
177 00:09:37.595 --> 00:09:40.255 And I think it's like one of those things in generative ai,
178 00:09:40.835 --> 00:09:44.775 it hasn't, people disagree on the definitions so far.
179 00:09:45.395 --> 00:09:47.015 Um, we're still sort of like coalescing
180 00:09:47.015 --> 00:09:49.055 around an exact, uh, definition.
181 00:09:50.515 --> 00:09:54.735 But, um, uh, uh, my definition, which I think overlaps with
182 00:09:54.735 --> 00:09:59.135 with many people is it's an agent that, so the, um, the,
183 00:09:59.135 --> 00:10:02.375 the goal is to, to, to, um,
184 00:10:02.595 --> 00:10:06.015 to do research in the sense that it has to go off
185 00:10:06.555 --> 00:10:10.255 and discover, uh, many, many relevant sources.
186 00:10:10.635 --> 00:10:14.095 So it is not just like doing a single lookup to, um,
187 00:10:14.295 --> 00:10:16.455 a vector database or, you know,
188 00:10:16.455 --> 00:10:19.495 it's not just accessing like a single Wikipedia page,
189 00:10:20.205 --> 00:10:23.775 it's pulling in, uh, it's, it's, um, uh,
190 00:10:23.775 --> 00:10:28.695 making a decision about, uh, various sources to, to search
191 00:10:30.275 --> 00:10:32.215 and then, um, uh,
192 00:10:32.545 --> 00:10:35.015 break the question down into to multiple steps.
193 00:10:35.995 --> 00:10:40.295 Um, and, uh, sort of, uh, have, uh,
194 00:10:40.395 --> 00:10:42.415 or, uh, autonomously sort of like a reason
195 00:10:42.415 --> 00:10:45.775 through answering the question and then synthesize like a,
196 00:10:46.135 --> 00:10:49.015 a detailed report, um, at the end.
197 00:10:49.955 --> 00:10:52.655 And so, um, I got some, like, some quotes here from the,
198 00:10:53.115 --> 00:10:55.575 the, the deep research release blog.
199 00:10:56.275 --> 00:10:58.615 And I sort of like saw like three themes.
200 00:10:59.355 --> 00:11:03.735 So we've got iteration, uh, we've got, um, search
201 00:11:04.115 --> 00:11:07.295 or, um, uh, I guess we talk like, like tool usage.
202 00:11:07.835 --> 00:11:10.655 And then the third is, uh, reasoning.
203 00:11:11.835 --> 00:11:16.055 So under the, the topic of iteration, so the, the, um,
204 00:11:16.325 --> 00:11:20.335 deep research release it, uh, blog posts, it mentioned, it,
205 00:11:20.355 --> 00:11:24.775 it had, um, uh, describe things like learn to plan,
206 00:11:25.005 --> 00:11:29.015 execute a multi-step trajectory, also backtracking
207 00:11:29.115 --> 00:11:31.815 and reacting to real time information.
208 00:11:33.115 --> 00:11:36.655 So this is obviously like describing like a, um, uh,
209 00:11:36.655 --> 00:11:40.095 like an agent that is able to sort of know what
210 00:11:40.095 --> 00:11:43.535 to do next autonomously, um, also, uh, pivoting
211 00:11:43.555 --> 00:11:46.015 as needed in reaction to information it encounters.
212 00:11:47.635 --> 00:11:48.935 Um, so a second thing,
213 00:11:48.935 --> 00:11:51.175 and I think these are sort of like, there's a, you know,
214 00:11:51.245 --> 00:11:52.415 overlap between these three.
215 00:11:52.875 --> 00:11:56.565 Uh, but under the, the topic of search, the,
216 00:11:56.565 --> 00:12:00.285 the blog post contained things like train end-to-end
217 00:12:00.885 --> 00:12:02.805 reinforcement learning on hard browsing
218 00:12:02.805 --> 00:12:05.485 and reasoning tasks across a range of domains.
219 00:12:06.025 --> 00:12:08.045 And I think it's generally sort of like, suppose
220 00:12:08.045 --> 00:12:10.005 that this is kind of like the main, uh,
221 00:12:10.005 --> 00:12:14.125 secret source ingredient, um, also optimized
222 00:12:14.125 --> 00:12:16.285 for web browsing and data analysis.
223 00:12:17.985 --> 00:12:22.565 And then, um, uh, on, on the, the third theme, which is, uh,
224 00:12:22.695 --> 00:12:25.605 which overlaps with iteration search is, uh, reasoning.
225 00:12:26.305 --> 00:12:28.685 So fine tuned on the upcoming
226 00:12:29.275 --> 00:12:31.005 open AI oh three reasoning model.
227 00:12:31.505 --> 00:12:33.845 Um, and it leverages reasoning to search, interpret,
228 00:12:33.905 --> 00:12:37.645 and analyze massive amounts of, uh, text.
229 00:12:39.025 --> 00:12:41.445 So we can sort of like, uh, uh, sort
230 00:12:41.445 --> 00:12:44.565 of like piece from this, this, um, uh, uh,
231 00:12:44.915 --> 00:12:47.365 blog post release, how it might work,
232 00:12:47.945 --> 00:12:49.805 and relate that to like the,
233 00:12:49.805 --> 00:12:51.765 the latest developments happening in generative ai
234 00:12:52.305 --> 00:12:53.845 and, um, try
235 00:12:53.845 --> 00:12:58.045 and sort of like, uh, uh, reproduce the results
236 00:12:58.505 --> 00:12:59.605 by building our own system.
237 00:13:05.195 --> 00:13:06.575 And, um, that's exactly what we did.
238 00:13:06.575 --> 00:13:10.295 So, uh, we were very excited by the release, um, given the,
239 00:13:10.955 --> 00:13:15.335 uh, you know, we sort of like saw the, the qualitative, um,
240 00:13:16.115 --> 00:13:17.975 uh, quality of, of the output.
241 00:13:18.755 --> 00:13:23.135 And so we were really curious, uh, being a, you know,
242 00:13:23.175 --> 00:13:26.215 a vector database company, vector databases being one
243 00:13:26.375 --> 00:13:29.375 of the core components, powering, um, agents.
244 00:13:30.035 --> 00:13:31.335 Uh, we were really curious, like,
245 00:13:31.335 --> 00:13:35.975 could we build our own open source version to, to, um,
246 00:13:35.975 --> 00:13:37.015 to, to work similarly.
247 00:13:37.675 --> 00:13:39.375 And, um, that's what we did about a month ago.
248 00:13:39.755 --> 00:13:42.575 Um, some engineers built, uh, an open source
249 00:13:43.335 --> 00:13:44.935 software called Deep Searcher.
250 00:13:47.965 --> 00:13:50.705 And so, so like deep research, you give it a query,
251 00:13:51.325 --> 00:13:55.265 it then goes off, um, searches through multiple sources, um,
252 00:13:55.295 --> 00:13:58.745 sort of like iterates, uh, like, uh, breaks down the,
253 00:13:58.765 --> 00:14:01.705 the question into, um, steps that can iterate over,
254 00:14:02.175 --> 00:14:06.185 finally makes a decision about when to, to, to stop,
255 00:14:06.525 --> 00:14:07.825 um, answering the question.
256 00:14:08.765 --> 00:14:11.305 And, um, then synthesizes like a detailed
257 00:14:11.525 --> 00:14:12.865 report from all that information.
258 00:14:16.535 --> 00:14:19.595 And, um, so this, um, uh, this research agent,
259 00:14:20.105 --> 00:14:24.435 it's built on top of the Vector database, um, vis, uh,
260 00:14:24.485 --> 00:14:27.515 which is, um, so, so zits, we are the main contributors.
261 00:14:28.095 --> 00:14:31.195 Uh, it's been donated to the, the Linux Foundation for AI
262 00:14:31.295 --> 00:14:34.675 and, and data, um, uh, since, um, uh,
263 00:14:35.005 --> 00:14:36.395 since I think, uh, 2020.
264 00:14:37.135 --> 00:14:39.635 And so, uh, let just say a few words about VIS
265 00:14:39.635 --> 00:14:43.595 before we go into the code of Deep searcher.
266 00:14:44.735 --> 00:14:49.275 So, um, so VIS is fully open source, um, Apache,
267 00:14:49.535 --> 00:14:52.125 um, library, so suitable for commercial use
268 00:14:53.105 --> 00:14:54.925 and, um, very simple to use.
269 00:14:55.225 --> 00:14:58.605 You can pip install the, the light version on,
270 00:14:58.625 --> 00:15:03.205 on your notebook, um, a much sort of more, um, uh,
271 00:15:03.445 --> 00:15:05.165 scalable version you can launch in a
272 00:15:05.165 --> 00:15:06.405 docket image really easily.
273 00:15:07.625 --> 00:15:12.325 And, um, then we have like a third version,
274 00:15:12.415 --> 00:15:15.325 which is the, the fully distributed version mils cluster
275 00:15:16.475 --> 00:15:19.605 that, um, uh, you know, like you, you launch on a cluster
276 00:15:19.605 --> 00:15:20.885 of machines via Kubernetes
277 00:15:21.385 --> 00:15:25.125 and can scale to like literally the, um, uh, tens
278 00:15:25.125 --> 00:15:27.085 to hundreds civilians of, of vectors.
279 00:15:31.985 --> 00:15:36.925 So, um, easier set up, um, has really good integration
280 00:15:37.715 --> 00:15:41.205 into the, the generative AI tooling ecosystem.
281 00:15:42.905 --> 00:15:46.725 So because it's open source, uh, we have a lot of, uh,
282 00:15:47.045 --> 00:15:50.205 contributions from, you know, pretty much all of like the,
283 00:15:50.205 --> 00:15:53.085 the, the big tools in generative ai.
284 00:15:53.905 --> 00:15:57.685 So whether that's like hugging face open ai, l chain, Gina,
285 00:15:58.185 --> 00:16:02.765 um, air byte, um, there's, um, I would say like, uh, like,
286 00:16:03.225 --> 00:16:05.325 you know, dozens and dozens of these integrations,
287 00:16:05.425 --> 00:16:09.485 so you'll be able to use it within your existing,
288 00:16:09.705 --> 00:16:10.965 uh, tool set most likely.
289 00:16:14.025 --> 00:16:17.085 And I think like a strong sort of like, um, I guess sort
290 00:16:17.085 --> 00:16:21.205 of like, uh, recommendation for, for its, uh, performance
291 00:16:21.425 --> 00:16:24.605 and reliability is the fact that it's used by a lot of,
292 00:16:24.625 --> 00:16:25.765 of really big companies.
293 00:16:26.345 --> 00:16:30.645 So everyone from nvidia, Microsoft, um, Salesforce,
294 00:16:31.785 --> 00:16:33.125 uh, Ikea and so on.
295 00:16:37.145 --> 00:16:41.485 And so, um, uh, just like very briefly mentioned, so, um,
296 00:16:41.605 --> 00:16:45.805 I think like a big sort of use for, uh, vector databases is,
297 00:16:45.985 --> 00:16:49.845 uh, retrieval augmented generation, as well as, uh,
298 00:16:49.845 --> 00:16:51.805 what we are now calling agents, which is sort
299 00:16:51.805 --> 00:16:53.365 of like extensions to this framework.
300 00:16:54.065 --> 00:16:55.925 And, um, so, so just to like, make it clear like
301 00:16:55.925 --> 00:16:57.045 where the vector database fits in,
302 00:16:57.935 --> 00:17:01.025 I've got like a schematic here of a rag pipeline.
303 00:17:02.045 --> 00:17:04.825 And so you start off with a knowledge base of things
304 00:17:04.825 --> 00:17:05.865 that you wanna search over.
305 00:17:06.245 --> 00:17:09.545 Oh, and so, so by the way, so, um, our sort of like, uh,
306 00:17:09.785 --> 00:17:13.905 research agent pipeline will be like an extension of, uh,
307 00:17:14.065 --> 00:17:15.385 a basic rag pipeline.
308 00:17:16.605 --> 00:17:18.985 Uh, so we've got a knowledge base that we wanna search over.
309 00:17:19.085 --> 00:17:21.425 So this might be like your internal company documents.
310 00:17:22.205 --> 00:17:25.865 It might be like, um, uh, images from customers
311 00:17:25.965 --> 00:17:27.585 or like videos that people uploaded.
312 00:17:29.005 --> 00:17:32.185 You then put that through your embedding deep your network
313 00:17:32.765 --> 00:17:34.465 to produce these vector embeddings,
314 00:17:34.925 --> 00:17:37.825 and then you store that in, uh, vis.
315 00:17:38.945 --> 00:17:43.005 And so, so Mils, um, then provides a really convenient, um,
316 00:17:43.145 --> 00:17:47.085 and efficient interface for performing a similarity search
317 00:17:47.625 --> 00:17:50.605 or, um, uh, essentially a semantic search.
318 00:17:51.265 --> 00:17:52.685 So in a rag chatbot,
319 00:17:52.705 --> 00:17:54.605 the user then comes along with their question.
320 00:17:56.145 --> 00:17:58.685 Um, so this question gets put
321 00:17:58.685 --> 00:18:00.405 through typically the same embedding model,
322 00:18:01.345 --> 00:18:04.885 and then we search for similar vectors to the query vector
323 00:18:05.625 --> 00:18:06.965 in our vector database
324 00:18:07.705 --> 00:18:11.685 and retrieve, uh, vectors that are close that correspond
325 00:18:11.685 --> 00:18:13.285 to items in our knowledge base.
326 00:18:14.265 --> 00:18:16.205 And because of the way that these models work,
327 00:18:16.775 --> 00:18:19.845 those ones will be semantically similar to our query.
328 00:18:20.545 --> 00:18:24.045 And so in other words, they'll contain relevant information
329 00:18:24.665 --> 00:18:26.805 to the, um, the query being answered.
330 00:18:27.825 --> 00:18:29.605 So then the idea of rag very simple.
331 00:18:30.265 --> 00:18:34.525 We just put those into the context of the prompt that we put
332 00:18:34.555 --> 00:18:37.405 that we, uh, run the large language model on,
333 00:18:37.425 --> 00:18:38.885 or large language vision model
334 00:18:38.945 --> 00:18:40.725 or whatever foundation model, um, you're using.
335 00:18:41.625 --> 00:18:45.565 So we augment the user's question with
336 00:18:46.095 --> 00:18:49.325 these retrieved, uh, documents from the vector database,
337 00:18:50.305 --> 00:18:51.725 put them into a large language model.
338 00:18:52.425 --> 00:18:55.485 And then because the large language model has that context,
339 00:18:56.355 --> 00:18:59.885 it's able to give a much more, um, uh, reliable,
340 00:19:00.345 --> 00:19:01.445 um, up-to-date answer.
341 00:19:02.105 --> 00:19:03.925 So you can, you think about it as like,
342 00:19:04.275 --> 00:19:08.225 like an external memory for the, um, for, for your,
343 00:19:08.225 --> 00:19:09.745 your rag or your agent.
344 00:19:10.605 --> 00:19:13.545 So an external memory that you can like, update
345 00:19:13.605 --> 00:19:15.905 as new facts, uh, new data come in,
346 00:19:16.725 --> 00:19:19.585 and you don't have to retrain your, um,
347 00:19:20.015 --> 00:19:22.105 your foundation model, your large language model.
348 00:19:24.935 --> 00:19:26.875 So I've got two, uh, links here.
349 00:19:27.415 --> 00:19:30.595 Uh, I think these are some really good resources if you're
350 00:19:30.595 --> 00:19:32.275 getting started with, uh, with VIS
351 00:19:32.335 --> 00:19:36.795 or just building, um, generative AI applications
352 00:19:36.795 --> 00:19:38.115 of vector databases in general.
353 00:19:39.055 --> 00:19:42.315 So on the ride, I've got the GitHub to, to melvic,
354 00:19:42.335 --> 00:19:44.995 so you know, you've got instructions to download it, a link
355 00:19:44.995 --> 00:19:49.115 to the docs, a lot of really useful, um, tutorials.
356 00:19:50.055 --> 00:19:51.795 Um, and then on the right hand side,
357 00:19:51.865 --> 00:19:55.075 I've got our generative AI learning, uh, portal,
358 00:19:55.405 --> 00:19:59.275 which has a lot of really useful, uh, notebooks
359 00:19:59.945 --> 00:20:02.395 telling you like a sort of, um, uh, taking you
360 00:20:02.395 --> 00:20:03.795 through the steps of building much more
361 00:20:04.265 --> 00:20:06.035 substantive, um, applications.
362 00:20:06.735 --> 00:20:10.595 So really good like resource to, to learn how to build, um,
363 00:20:10.895 --> 00:20:14.195 you know, rag agents, recommended systems,
364 00:20:14.395 --> 00:20:16.275 semantic search and so on.
365 00:20:18.305 --> 00:20:21.205 Uh, but let's now turn to a code walkthrough
366 00:20:21.305 --> 00:20:25.445 of deep searcher to see how we actually, um,
367 00:20:25.745 --> 00:20:29.885 how we actually constructed this, uh, this research agent.
368 00:20:30.785 --> 00:20:32.525 And I think it helps before we actually go into the code
369 00:20:32.915 --> 00:20:36.845 just to have like a mental, uh, model of, of what it's,
370 00:20:36.845 --> 00:20:39.565 what it's actually doing, uh, so that we can sort of like,
371 00:20:39.585 --> 00:20:42.285 um, you know, scaffold out what, uh, I guess like, sort
372 00:20:42.285 --> 00:20:43.685 of keep that in mind as we're going
373 00:20:43.685 --> 00:20:45.085 through the code so that it makes sense.
374 00:20:46.185 --> 00:20:49.885 Um, so, um, similarly to a rag system,
375 00:20:50.475 --> 00:20:53.445 this research agent has two separate parts.
376 00:20:53.505 --> 00:20:57.445 The first is, uh, data ingestion, which happens, um,
377 00:20:58.465 --> 00:21:00.845 uh, in our case, uh, beforehand.
378 00:21:01.625 --> 00:21:05.165 So you tell it what internal documents crawled web pages,
379 00:21:05.485 --> 00:21:09.245 structured data, uh, streaming data, um, in theory that,
380 00:21:09.245 --> 00:21:10.525 that you want to, to search over,
381 00:21:11.145 --> 00:21:13.005 and that gets stored, that gets embedded
382 00:21:13.145 --> 00:21:15.485 and stored in, in, in vu the Vector database.
383 00:21:16.025 --> 00:21:18.565 Um, so I think in, in, in like a future version, um,
384 00:21:18.665 --> 00:21:21.085 or I think it's a feature we're adding, is this sort
385 00:21:21.085 --> 00:21:25.125 of like more dynamic, uh, search the web as as needed.
386 00:21:26.805 --> 00:21:29.185 Um, so then the, the other part,
387 00:21:29.285 --> 00:21:31.185 the main part is this online serving.
388 00:21:32.005 --> 00:21:34.865 So the user will come in with a query,
389 00:21:36.255 --> 00:21:40.105 then we use a large language model, um, in our case, um,
390 00:21:40.225 --> 00:21:43.305 a reasoning model to, to break down the question
391 00:21:43.935 --> 00:21:46.585 into a number of, uh, sub-questions
392 00:21:46.605 --> 00:21:51.055 or subqueries, um, which then, um, uh,
393 00:21:51.615 --> 00:21:54.775 a, a router sort of works out like which, uh, which sort
394 00:21:54.775 --> 00:21:58.695 of like data store to, to fetch relevant entries from, uh,
395 00:21:58.695 --> 00:22:00.535 which we then do from the vector database.
396 00:22:01.955 --> 00:22:03.615 Um, and then I think this is sort of like the, um,
397 00:22:03.765 --> 00:22:05.695 what makes it, uh, you know,
398 00:22:05.695 --> 00:22:08.295 you can call it an agent rather than just plain rag
399 00:22:09.035 --> 00:22:11.095 is it has this reflection step
400 00:22:11.605 --> 00:22:14.135 that decides what to do next.
401 00:22:15.075 --> 00:22:17.055 So the LLM says, uh,
402 00:22:17.075 --> 00:22:19.415 or the, the prompt asks it to answer the question,
403 00:22:20.515 --> 00:22:24.655 are there any gaps in the, um, the, the, the questions
404 00:22:25.005 --> 00:22:27.055 that have been, um, asked
405 00:22:27.075 --> 00:22:31.135 and answered so far using the information from the, um,
406 00:22:31.365 --> 00:22:33.255 from the, uh, data ingestion?
407 00:22:34.075 --> 00:22:38.695 And so if it says, um, yes, there are still gaps,
408 00:22:39.035 --> 00:22:40.815 uh, knowledge gaps to be answered,
409 00:22:41.565 --> 00:22:44.175 then it will generate new queries.
410 00:22:45.365 --> 00:22:47.105 Um, and then just go through the same process
411 00:22:47.325 --> 00:22:51.305 and sort of keep looping this until it's satisfied that, uh,
412 00:22:51.335 --> 00:22:54.905 it's either like exhausted a, uh, like, like a budget
413 00:22:54.965 --> 00:22:58.665 of iterations or tokens or more likely
414 00:22:58.805 --> 00:23:02.545 before then it's exhausted all of the, the questions
415 00:23:02.895 --> 00:23:06.625 that it believes, uh, it needs to answer to, um, to, to sort
416 00:23:06.625 --> 00:23:09.265 of cover the, the query and not have any knowledge gaps.
417 00:23:10.765 --> 00:23:14.825 So, um, this is what makes it, uh, we can call it an agent
418 00:23:15.365 --> 00:23:18.225 rather than just like a, uh, like a plain rag
419 00:23:18.925 --> 00:23:23.865 is the LLM is being used to like to route, um, the,
420 00:23:24.125 --> 00:23:25.145 um, execution.
421 00:23:27.065 --> 00:23:30.605 Um, and, um, I guess we can also, we can think of this, uh,
422 00:23:30.635 --> 00:23:34.005 calling the vector database in response to like
423 00:23:34.635 --> 00:23:36.085 dynamically generated queries.
424 00:23:36.085 --> 00:23:38.685 We can sort of think about that as like a, a form
425 00:23:38.685 --> 00:23:40.605 of tool usage as well.
426 00:23:41.785 --> 00:23:45.725 So we, we've got this, like this, um, conditional execution.
427 00:23:46.575 --> 00:23:49.365 We've got tool usage, um, two sort
428 00:23:49.365 --> 00:23:52.245 of like defining characteristics of, uh, being an agent.
429 00:23:53.985 --> 00:23:57.885 Um, and so then after that, then when it says, okay,
430 00:23:58.415 --> 00:24:02.115 there are no knowledge gaps, then it'll move on
431 00:24:02.115 --> 00:24:04.195 to the final step, which is
432 00:24:04.395 --> 00:24:06.875 to then use the large language model to,
433 00:24:07.015 --> 00:24:10.275 to join those answers from the sub-questions
434 00:24:10.905 --> 00:24:14.115 into like a single coherent, uh, final report.
435 00:24:15.775 --> 00:24:17.195 So, um,
436 00:24:18.335 --> 00:24:23.285 and, uh, uh, one thing I should mention is, so we, uh,
437 00:24:23.285 --> 00:24:28.165 we are using like, um, uh, in, uh, typically like a,
438 00:24:28.485 --> 00:24:32.605 a reasoning LLM for, uh, for these steps,
439 00:24:33.305 --> 00:24:35.485 uh, which I'll explain in a bit more detail, um,
440 00:24:36.065 --> 00:24:38.205 uh, a a few slides on.
441 00:24:38.665 --> 00:24:41.885 Uh, but that's sort of something that's really useful
442 00:24:41.885 --> 00:24:43.685 for like improving the performance of this,
443 00:24:44.065 --> 00:24:45.245 um, this reflection step.
444 00:24:47.375 --> 00:24:50.065 Okay. So let's go across to the, the GitHub
445 00:24:50.165 --> 00:24:51.985 and actually take a look at some code.
446 00:24:55.025 --> 00:24:57.165 And by the way, so, um, don't be shy.
447 00:24:57.165 --> 00:24:59.965 If you have any questions, uh, feel free
448 00:24:59.965 --> 00:25:01.925 to write them in the chat and I'll, I'll stop
449 00:25:01.925 --> 00:25:03.205 and answer them as they come up.
450 00:25:04.545 --> 00:25:08.085 So this is the, the GitHub repository for deep searcher.
451 00:25:09.465 --> 00:25:12.085 And, um, you can see there the, the architectural diagram.
452 00:25:14.205 --> 00:25:16.945 Um, and, um, uh, so this is like a,
453 00:25:16.945 --> 00:25:19.465 like a screenshot from from output.
454 00:25:19.685 --> 00:25:20.985 You can see it's sort of like printing.
455 00:25:21.805 --> 00:25:23.465 Um, so, uh, yeah,
456 00:25:23.465 --> 00:25:26.465 so like printing the inter intermediate steps, um,
457 00:25:26.465 --> 00:25:29.025 like the iterations of breaking it down into subqueries
458 00:25:29.025 --> 00:25:30.705 and answering those, um,
459 00:25:30.845 --> 00:25:32.465 you can see it says accelerated playback.
460 00:25:32.465 --> 00:25:35.265 So that's sort of one of the, uh, I guess like sort of the,
461 00:25:35.325 --> 00:25:39.585 the challenges of research agents currently is, um,
462 00:25:39.585 --> 00:25:43.145 they're very expensive in terms of, um, uh,
463 00:25:43.715 --> 00:25:45.105 foundation model inference.
464 00:25:46.285 --> 00:25:49.465 So, um, the example I'll be showing later on,
465 00:25:49.985 --> 00:25:53.905 I think made something like 75, uh, queries
466 00:25:53.905 --> 00:25:56.105 to a reasoning large language model.
467 00:25:56.885 --> 00:26:00.345 And, um, you know, that that's why it takes like 10 minutes,
468 00:26:00.375 --> 00:26:04.225 half an hour, or potentially longer to actually, uh, to run
469 00:26:04.225 --> 00:26:05.305 through all of the reasoning steps.
470 00:26:06.685 --> 00:26:08.545 Um, I actually hit the, the rate limit for,
471 00:26:08.545 --> 00:26:11.185 for when I was doing it with, um, um, an online service.
472 00:26:11.365 --> 00:26:15.985 So, um, I had to sort of do my 10 queries, wait a minute,
473 00:26:16.605 --> 00:26:17.665 do another 10 queries.
474 00:26:18.285 --> 00:26:22.945 Um, so, uh, uh, I think a key point is like, um, uh,
475 00:26:23.015 --> 00:26:25.145 inference is really like a key bottleneck.
476 00:26:26.855 --> 00:26:31.345 Okay, so we got a question from, um, Anna Ruda, which is
477 00:26:31.645 --> 00:26:36.345 how, how does the LM know when it has got sufficient, uh,
478 00:26:36.345 --> 00:26:38.625 knowledge such that there are no, uh, knowledge gaps?
479 00:26:39.615 --> 00:26:40.785 Yeah, so it's a great question.
480 00:26:41.445 --> 00:26:43.790 Um, I think this is, this is is just sort of like down
481 00:26:43.790 --> 00:26:46.045 to the, um, you know, like the, the,
482 00:26:46.045 --> 00:26:48.165 the magic emergent properties of LLMs.
483 00:26:48.545 --> 00:26:52.965 Uh, these models have been trained on, uh, specifically on,
484 00:26:52.985 --> 00:26:55.485 on reasoning like, uh, multi-step reasoning tasks.
485 00:26:56.385 --> 00:27:01.325 And, um, uh, so, um, yeah, it's,
486 00:27:01.605 --> 00:27:05.945 I mean, it, uh, it is just one of the sort of like the, um,
487 00:27:06.325 --> 00:27:08.265 the, the things that the OM can do,
488 00:27:08.855 --> 00:27:10.465 it's been trained on like so much data
489 00:27:10.645 --> 00:27:14.105 and sort of related tasks in, in, in the, the post training
490 00:27:14.615 --> 00:27:16.665 that it seems to be able to perform this task as well.
491 00:27:19.245 --> 00:27:21.585 Um, so does it need to be a reasoning model?
492 00:27:21.805 --> 00:27:24.505 Um, so, so no, it doesn't need to be a reasoning model.
493 00:27:25.105 --> 00:27:28.545 I think actually it would be advantageous to use some,
494 00:27:28.545 --> 00:27:31.185 some cheaper models for some of the other steps.
495 00:27:31.605 --> 00:27:34.065 So as you mentioned, so breaking down the subtask, uh,
496 00:27:34.225 --> 00:27:37.105 breaking down the query into subqueries, I think, um,
497 00:27:37.105 --> 00:27:41.505 that's something that you could have a much smaller, um, uh,
498 00:27:41.535 --> 00:27:44.465 well, I mean even just like a, a, a more sort
499 00:27:44.465 --> 00:27:46.305 of general chatbot lm, uh,
500 00:27:46.325 --> 00:27:51.185 but ideally a much smaller language model that has been, uh,
501 00:27:51.565 --> 00:27:53.745 has been like fine tuned for that purpose.
502 00:27:55.005 --> 00:27:58.225 So, um, that's sort of, I think like one of the sort of
503 00:27:58.895 --> 00:28:01.425 easy solutions we can get to, like speeding up the inference
504 00:28:01.425 --> 00:28:04.225 of these models is, uh, using sort
505 00:28:04.225 --> 00:28:07.025 of like smaller specialized models for each of the steps.
506 00:28:08.965 --> 00:28:13.545 So great question. Okay, so, um,
507 00:28:14.765 --> 00:28:16.625 so, uh, let's go through the code.
508 00:28:16.645 --> 00:28:20.425 So if you wanna try this at home, um, you can,
509 00:28:20.525 --> 00:28:23.945 you can get clone this repository, install the, uh,
510 00:28:23.945 --> 00:28:27.705 dependencies, and then, uh, copy and paste.
511 00:28:27.805 --> 00:28:32.545 Um, so here's some, uh, here's an example of, uh,
512 00:28:32.685 --> 00:28:35.385 how you actually sort of initiate a call to the,
513 00:28:35.545 --> 00:28:36.545 the deep research agent.
514 00:28:37.565 --> 00:28:41.145 So you can see here we create like a default configuration,
515 00:28:42.415 --> 00:28:44.905 then we put some, we, we override some of the options.
516 00:28:45.605 --> 00:28:49.105 So we tell the, um, the, the, the research agent
517 00:28:49.105 --> 00:28:51.385 that we want to use open AI
518 00:28:51.925 --> 00:28:54.065 as our large language inference service,
519 00:28:54.445 --> 00:28:58.705 and specifically we wanna use GPT-4 oh, um, uh, mini.
520 00:28:59.725 --> 00:29:02.065 Um, and, uh, for the embedding model,
521 00:29:02.315 --> 00:29:04.865 we're also gonna use open AI's embedding service.
522 00:29:05.925 --> 00:29:08.745 So, but, um, uh, deep searcher supports a number
523 00:29:08.745 --> 00:29:12.665 of different, uh, inference and embedding services.
524 00:29:13.485 --> 00:29:15.025 For example, you might want
525 00:29:15.025 --> 00:29:17.305 to use hugging faces sentence transformers
526 00:29:17.305 --> 00:29:18.945 locally for the embedding.
527 00:29:19.645 --> 00:29:22.025 Um, or in my case, you might want
528 00:29:22.025 --> 00:29:24.265 to use like a distilled version of, uh,
529 00:29:24.295 --> 00:29:26.905 deep seek RR one for the large language model.
530 00:29:28.455 --> 00:29:33.385 Okay. So then, um, we, um, ingest the, the,
531 00:29:33.385 --> 00:29:35.265 the data that we wanna search over.
532 00:29:36.085 --> 00:29:38.545 So, um, in this case, we are specifying
533 00:29:38.975 --> 00:29:40.905 that data in advance, uh,
534 00:29:40.945 --> 00:29:43.685 but as we like develop this, it'll be able to like
535 00:29:44.625 --> 00:29:47.405 go off like autonomously and, uh, find relevant sources,
536 00:29:48.385 --> 00:29:50.245 and then we just need to call this, uh,
537 00:29:50.245 --> 00:29:51.445 this query function here.
538 00:29:52.625 --> 00:29:56.005 Um, so, uh, one way, like I sort of like to sort of work out
539 00:29:56.005 --> 00:30:00.805 how code is working is literally just step through, um,
540 00:30:00.835 --> 00:30:02.125 just step through the functions.
541 00:30:02.265 --> 00:30:05.725 So, um, uh, so, you know, like I,
542 00:30:06.085 --> 00:30:08.245 I put this into a document, I ran it.
543 00:30:08.785 --> 00:30:11.525 Um, unfortunately I can't really do a live demo
544 00:30:11.525 --> 00:30:16.205 because, um, it, it would take like, say 10 plus minutes to,
545 00:30:16.385 --> 00:30:18.445 to give back a, um, report.
546 00:30:19.505 --> 00:30:22.205 Um, but imagine we've done that, so we know that it works.
547 00:30:22.205 --> 00:30:24.925 We can say, okay, let's actually, let's sort of do a, um,
548 00:30:24.925 --> 00:30:26.325 like a step through debugging
549 00:30:26.825 --> 00:30:28.925 and look inside each of these functions
550 00:30:28.925 --> 00:30:31.325 and work out how it's actually doing what it's doing.
551 00:30:32.065 --> 00:30:36.805 So we've got here query from online query, okay,
552 00:30:36.985 --> 00:30:39.845 so then, um, I use VS code,
553 00:30:39.845 --> 00:30:41.365 you can just sort of jump to definition.
554 00:30:42.545 --> 00:30:46.005 So then we see that it's calling, um,
555 00:30:46.195 --> 00:30:49.125 this configuration dot default searcher,
556 00:30:50.465 --> 00:30:53.165 and it's calling default searcher query on that.
557 00:30:54.915 --> 00:30:57.425 So what is default search?
558 00:30:57.615 --> 00:30:59.585 Well, we'll have to go to the configuration
559 00:30:59.605 --> 00:31:02.265 and see what the, um, is setting it to there.
560 00:31:03.805 --> 00:31:05.105 Oh, and so just before we go on,
561 00:31:05.105 --> 00:31:08.655 so we've got another question from, uh,
562 00:31:15.155 --> 00:31:16.515 actually no, sorry, I think I've already answered that.
563 00:31:16.515 --> 00:31:19.635 That's from, uh, from Anna about how does it know
564 00:31:19.635 --> 00:31:20.795 that there are, are knowledge gaps?
565 00:31:22.575 --> 00:31:27.555 Um, and yeah, so, so that's just like, like I said, um, the,
566 00:31:27.555 --> 00:31:31.355 uh, the, these foundation models, um, especially
567 00:31:31.355 --> 00:31:33.035 after they've been sort of like post trained for,
568 00:31:33.055 --> 00:31:36.795 for different tasks like chat, uh, reasoning evaluation,
569 00:31:37.385 --> 00:31:41.315 have like massive sort of transfer learning to unseen tasks.
570 00:31:42.055 --> 00:31:46.435 And so, um, I'm not sure exactly whether this task of
571 00:31:47.035 --> 00:31:48.075 identifying, uh,
572 00:31:48.145 --> 00:31:50.315 knowledge gaps was in the training somewhere.
573 00:31:50.775 --> 00:31:53.275 Uh, but it's, you know, just like the power of like scale
574 00:31:53.335 --> 00:31:56.555 and transfer learning that it can do tasks like this.
575 00:31:58.465 --> 00:32:00.915 Okay, so we see that, uh,
576 00:32:00.985 --> 00:32:05.675 when the configuration is initialized, the default searcher,
577 00:32:05.905 --> 00:32:10.795 it's, it's, um, creating this, this rag router option,
578 00:32:12.395 --> 00:32:16.975 and then it creates, um, so we've got two agents in here.
579 00:32:18.255 --> 00:32:22.215 I think it's gonna sort of like, uh, you know, um, uh,
580 00:32:23.235 --> 00:32:25.975 uh, work out, which onto route to, but we'll, we'll look in.
581 00:32:25.995 --> 00:32:28.135 So, so chain of rag is like a different technique.
582 00:32:28.755 --> 00:32:32.145 Um, if you're interested, you can, you can check out this,
583 00:32:32.255 --> 00:32:35.065 this research paper that describes it in a bit more detail.
584 00:32:35.845 --> 00:32:40.005 Um, but we we're gonna look inside the, the deep search,
585 00:32:40.945 --> 00:32:43.285 um, uh, object here.
586 00:32:44.145 --> 00:32:47.645 And so it seems like this object is gonna contain the logic
587 00:32:48.305 --> 00:32:52.965 to perform our, excuse me, to perform this, um,
588 00:32:52.965 --> 00:32:55.325 like this architecture of, uh, research agent.
589 00:32:57.545 --> 00:33:00.005 So let's have a look into deep search now.
590 00:33:02.105 --> 00:33:04.805 So I've gone across to the deep search, the, the file
591 00:33:04.805 --> 00:33:06.885 that contains, uh, deep search.
592 00:33:07.265 --> 00:33:09.045 And by the way, is, is this actually,
593 00:33:09.065 --> 00:33:10.245 is this large enough for everyone?
594 00:33:10.275 --> 00:33:12.565 I'll just, um, see if I can make the font a bit bigger.
595 00:33:14.895 --> 00:33:19.625 Okay. Um, so, uh, here is the definition
596 00:33:19.725 --> 00:33:22.065 of this deep search, um, object.
597 00:33:23.415 --> 00:33:26.915 Um, you can see it sort of takes a number of parameters to,
598 00:33:27.255 --> 00:33:30.995 um, uh, to, to store in the object.
599 00:33:30.995 --> 00:33:34.795 So it takes like a base LLM, it takes an embedding model,
600 00:33:35.775 --> 00:33:39.195 it takes a vector db, um,
601 00:33:40.195 --> 00:33:41.475 a max number of iterations.
602 00:33:41.735 --> 00:33:44.835 So that's gonna be like a, like a limit on the number
603 00:33:44.835 --> 00:33:47.315 of these reflection cycles that we can do,
604 00:33:51.305 --> 00:33:55.615 um, as well as some other sort
605 00:33:55.815 --> 00:33:57.935 of like settings that, um, a bit more sort
606 00:33:57.935 --> 00:33:58.975 of, um, miscellaneous.
607 00:33:58.995 --> 00:34:02.415 So I'll, I'll jump over them. Um, so, um,
608 00:34:03.715 --> 00:34:05.095 now we know like what the object is
609 00:34:05.095 --> 00:34:06.615 that is actually performing the query.
610 00:34:07.235 --> 00:34:08.255 So I'll just go back here.
611 00:34:08.275 --> 00:34:13.105 You can see that this method here is gonna call, uh,
612 00:34:13.175 --> 00:34:14.345 deep search dot query.
613 00:34:14.525 --> 00:34:15.865 So what does that do?
614 00:34:15.895 --> 00:34:17.705 Well, let's go and have a look at, at the code.
615 00:34:22.655 --> 00:34:24.505 Okay, so here we are here.
616 00:34:24.885 --> 00:34:26.705 And, um, so, so just to, to reiterate.
617 00:34:26.725 --> 00:34:29.705 So, um, I would actually do this with a step
618 00:34:29.705 --> 00:34:34.225 through debugger in, uh, vs code, just so like doing,
619 00:34:34.345 --> 00:34:37.385 you know, step through, stepping into, stepping into, um,
620 00:34:37.805 --> 00:34:40.185 to sort of like follow the, the path of execution.
621 00:34:40.965 --> 00:34:43.105 And, um, that just, it's like a really good way
622 00:34:43.105 --> 00:34:47.345 to understand, which is like the relevant code to sort of,
623 00:34:47.345 --> 00:34:50.665 um, uh, like what, uh, what code is doing, what,
624 00:34:51.275 --> 00:34:53.025 where is like the most relevant code
625 00:34:53.025 --> 00:34:54.385 to understand what's going on.
626 00:34:55.285 --> 00:34:59.545 Um, and also, um, uh, so, um, uh,
627 00:35:00.165 --> 00:35:01.865 not everyone's familiar with this, but I think a really
628 00:35:01.865 --> 00:35:03.465 helpful tool for, for debugging
629 00:35:03.465 --> 00:35:08.265 and understanding code is the, uh, the debug, um, uh,
630 00:35:08.265 --> 00:35:12.185 console in, in vs code or whatever IDE is.
631 00:35:12.765 --> 00:35:15.145 So when you've stopped execution in your step three
632 00:35:15.145 --> 00:35:17.945 debugging, you can actually then just type in expressions
633 00:35:18.615 --> 00:35:19.945 into the, the debug terminal.
634 00:35:20.845 --> 00:35:23.025 So you can type in, you know, like,
635 00:35:23.025 --> 00:35:24.545 what is the shape of this tensor?
636 00:35:25.335 --> 00:35:30.265 What is, um, the value of this flag, um, is,
637 00:35:30.505 --> 00:35:31.745 does some condition hold?
638 00:35:31.885 --> 00:35:33.065 And that's a really useful way
639 00:35:33.065 --> 00:35:35.985 for like interrogating the program, um, as it's running
640 00:35:36.325 --> 00:35:37.685 to understand what's going on.
641 00:35:39.025 --> 00:35:41.405 But, um, let's look at, let's look at this query function.
642 00:35:42.265 --> 00:35:44.925 So we can see that the first thing it does is it calls
643 00:35:46.595 --> 00:35:47.805 self retrieve.
644 00:35:48.675 --> 00:35:51.965 Okay? So I think it we're sort of like untangling the, the,
645 00:35:51.985 --> 00:35:56.765 um, uh, you know, like the, the execution flow of of, of
646 00:35:56.765 --> 00:35:58.285 where the actual agent's happening.
647 00:35:59.145 --> 00:36:00.525 Uh, I think we're getting a bit closer.
648 00:36:00.745 --> 00:36:02.845 So now let's look at the self retrieve function.
649 00:36:06.205 --> 00:36:07.065 So where is it?
650 00:36:11.015 --> 00:36:15.875 Oh yeah, here we go. So self retrieve that call, that calls,
651 00:36:16.255 --> 00:36:19.275 um, the self dot asynchronous retrieve.
652 00:36:20.095 --> 00:36:22.435 Um, so these, we are just sort of like hacking
653 00:36:22.435 --> 00:36:24.675 through the layers of indirection to get
654 00:36:24.675 --> 00:36:25.755 to the, the core of it.
655 00:36:27.335 --> 00:36:30.715 And so, um, it's gonna call this, it's gonna run this, um,
656 00:36:31.575 --> 00:36:33.555 uh, this function asynchronously.
657 00:36:34.335 --> 00:36:36.635 Um, and now it looks like we've actually gotten
658 00:36:36.695 --> 00:36:39.315 to like the core logic of how the, um,
659 00:36:39.655 --> 00:36:41.915 the research, um, agent works.
660 00:36:43.215 --> 00:36:47.945 Okay? So we start off by, um, just like
661 00:36:48.895 --> 00:36:50.185 setting a variable
662 00:36:50.255 --> 00:36:52.425 that contains the maximum number of iterations.
663 00:36:54.565 --> 00:36:59.425 And, um, so this comment here, um, indicates that,
664 00:37:00.365 --> 00:37:03.185 um, the first thing we'll do is we'll
665 00:37:03.715 --> 00:37:05.865 break down the query into subqueries
666 00:37:06.565 --> 00:37:08.265 by prompting the large language model.
667 00:37:10.125 --> 00:37:13.425 So that's this sort of like, jump from the useless query
668 00:37:13.485 --> 00:37:15.985 to the first, uh, set of subqueries.
669 00:37:19.115 --> 00:37:22.015 And we've got a, a generate subqueries, uh, method here.
670 00:37:22.235 --> 00:37:26.375 Um, now that we've, like, we've sort of, we've reached, um,
671 00:37:26.525 --> 00:37:28.975 sort of like the core logic, um, I won't sort
672 00:37:28.975 --> 00:37:30.615 of jump down further until we've gone
673 00:37:30.615 --> 00:37:32.215 through this entire loop and we can see like
674 00:37:32.375 --> 00:37:36.855 what exactly are the prompts that, um, uh, uh,
675 00:37:37.045 --> 00:37:39.375 what are the prompts that the LLM is being prompted with
676 00:37:39.835 --> 00:37:43.295 to do the different tasks like generating the subqueries,
677 00:37:43.475 --> 00:37:45.815 um, working out where there's knowledge gaps and so on.
678 00:37:47.365 --> 00:37:50.975 Okay, so, um, here we've got the log color print.
679 00:37:50.975 --> 00:37:52.455 So this is what you'll see in the terminal
680 00:37:52.525 --> 00:37:53.815 when it's performing this step.
681 00:37:55.155 --> 00:37:58.655 Um, and then it takes the, the list of current subqueries
682 00:37:58.655 --> 00:38:00.975 and then just adds those ones to it.
683 00:38:03.695 --> 00:38:05.075 And so now we have a loop.
684 00:38:05.215 --> 00:38:08.635 So now we've sort of entered this main, uh, logic loop,
685 00:38:09.545 --> 00:38:11.915 this main, um, uh,
686 00:38:12.175 --> 00:38:13.515 I'm not sure if you can see my mouse pointed,
687 00:38:13.535 --> 00:38:17.355 but I'm sort of like circling around this like inner loop in
688 00:38:17.355 --> 00:38:19.835 the online serving, um, area of,
689 00:38:20.135 --> 00:38:21.955 of the, the architect diagram.
690 00:38:24.175 --> 00:38:28.955 Um, so the first, the first step is to, um,
691 00:38:29.215 --> 00:38:32.395 so to, to, um, uh, search
692 00:38:32.415 --> 00:38:34.875 for relevant chunks from the vector database,
693 00:38:35.535 --> 00:38:40.405 given the query and the, um, the, the, the subqueries.
694 00:38:41.585 --> 00:38:45.085 So this is actually gonna return, um, some like,
695 00:38:45.085 --> 00:38:46.125 asynchronous tasks.
696 00:38:47.065 --> 00:38:52.045 So, um, uh, so that, that's this sort of like step of,
697 00:38:52.585 --> 00:38:55.325 of calling the, um, the, the vector database here
698 00:38:55.475 --> 00:38:57.405 with those, those queries and subqueries.
699 00:38:59.365 --> 00:39:02.305 Um, so tho those just return tasks, um,
700 00:39:02.305 --> 00:39:06.145 they don't get executed until we call this awai,
701 00:39:06.685 --> 00:39:08.065 um, async io gather.
702 00:39:08.765 --> 00:39:12.265 And that actually executes the, the tasks in parallel
703 00:39:12.405 --> 00:39:15.905 and then waits for the, the final one to, to finish
704 00:39:16.485 --> 00:39:17.905 before setting search results.
705 00:39:19.575 --> 00:39:22.185 Okay? So then for, um, uh,
706 00:39:22.445 --> 00:39:24.585 we take these results from the subqueries
707 00:39:25.125 --> 00:39:26.145 and then we merge them,
708 00:39:26.685 --> 00:39:30.945 and we're also keeping track of how many consume tokens, uh,
709 00:39:31.205 --> 00:39:34.145 we have because, um, you know, we wanna sort of be able to,
710 00:39:34.845 --> 00:39:37.905 to calculate the, the cost of this afterwards.
711 00:39:38.685 --> 00:39:40.905 Um, also we might wanna set like a,
712 00:39:41.495 --> 00:39:45.985 like a hard limit on like a, like a token budget, uh,
713 00:39:46.195 --> 00:39:48.185 token token budget I should say.
714 00:39:49.965 --> 00:39:52.625 Um, and, um,
715 00:39:55.285 --> 00:39:58.135 okay, so then we sort of, um, yeah, then we take, um,
716 00:39:58.155 --> 00:40:01.695 the search results, uh, put them into this, this list of,
717 00:40:01.835 --> 00:40:03.575 uh, search results from Vector db.
718 00:40:05.595 --> 00:40:10.455 And, um, we, so, um, uh, I think in many cases
719 00:40:11.875 --> 00:40:13.615 we are going to have, uh,
720 00:40:13.645 --> 00:40:16.455 duplicate chunks returned from the vector database.
721 00:40:17.635 --> 00:40:21.575 So, um, a good step is just to like, to deduplicate those so
722 00:40:21.575 --> 00:40:23.535 that we have a list of like unique chunks
723 00:40:24.085 --> 00:40:26.455 fetched from the vector database, uh,
724 00:40:26.805 --> 00:40:28.415 from those subquery queries.
725 00:40:31.435 --> 00:40:33.775 Um, so this is where we break if we've,
726 00:40:33.955 --> 00:40:36.215 if we've reached the maximum number of iterations.
727 00:40:37.755 --> 00:40:42.015 Uh, but then the next step is the performing the reflection
728 00:40:42.555 --> 00:40:45.615 and getting, uh, additional, um, queries that can
729 00:40:46.145 --> 00:40:47.815 cover any, like knowledge gaps.
730 00:40:49.055 --> 00:40:52.635 So if we go back here now, you can see where, so we,
731 00:40:52.635 --> 00:40:56.195 we've sort of gone through this loop, now we're in this, um,
732 00:40:56.665 --> 00:40:59.035 this, uh, yellow orange diamond,
733 00:40:59.995 --> 00:41:02.895 and we're performing this, uh, this reflection step.
734 00:41:03.835 --> 00:41:05.495 So this is where the LLM
735 00:41:05.835 --> 00:41:07.775 or the reasoning model is going
736 00:41:07.775 --> 00:41:10.295 to actually control the execution.
737 00:41:14.655 --> 00:41:18.275 And, um, so, uh, then, um, we just perform.
738 00:41:18.375 --> 00:41:21.115 And so we, uh, we prompt the l the reasoning model
739 00:41:21.145 --> 00:41:25.155 with another prompt to generate the, the gap queries.
740 00:41:26.855 --> 00:41:29.155 And, um, so this is gonna generate, so if, if it,
741 00:41:29.175 --> 00:41:33.355 if the model believes that there are additional queries
742 00:41:33.355 --> 00:41:35.315 that need to be answered to sort
743 00:41:35.315 --> 00:41:38.275 of fill in any knowledge gaps, that it will return them in,
744 00:41:38.375 --> 00:41:40.355 in the sub, uh, sub gapp queries.
745 00:41:41.415 --> 00:41:45.595 Um, so we know that if, if that's empty, then we know
746 00:41:45.595 --> 00:41:46.835 that we can terminate that loop
747 00:41:48.695 --> 00:41:51.325 and then go onto the generating the final report.
748 00:41:53.705 --> 00:41:58.165 Uh, but otherwise we then just add those new sub subqueries
749 00:41:58.665 --> 00:41:59.685 to the subqueries.
750 00:41:59.685 --> 00:42:01.885 So this is sort of like a stack of, of,
751 00:42:02.065 --> 00:42:03.285 uh, queries to answer.
752 00:42:03.945 --> 00:42:05.565 We then add them to that list
753 00:42:06.185 --> 00:42:09.485 and then repeat this, um, iteration.
754 00:42:11.185 --> 00:42:14.005 So we then just do like another loop around here.
755 00:42:15.945 --> 00:42:18.285 Um, so, and you know, that's essentially it.
756 00:42:18.345 --> 00:42:21.165 So, um, uh, if we've got time, I'll just sort
757 00:42:21.165 --> 00:42:23.765 of briefly sort of look into these, these functions
758 00:42:23.765 --> 00:42:25.005 that actually define the prompts
759 00:42:25.585 --> 00:42:28.725 and, um, uh, hold your horses as such.
760 00:42:28.845 --> 00:42:30.525 I, I, I've just got a few more minutes
761 00:42:30.585 --> 00:42:32.485 and then, um, I'll, I'll give some conclusions
762 00:42:33.305 --> 00:42:35.005 and then we'll, we'll leave, um, five,
763 00:42:35.025 --> 00:42:36.485 10 minutes at the end for any questions.
764 00:42:37.185 --> 00:42:40.245 Um, so actually I'll, uh, what I'll say is I'll, I'll, uh,
765 00:42:40.245 --> 00:42:43.045 leave this for, for your sort of, uh, you know, personal,
766 00:42:43.545 --> 00:42:46.285 um, enjoyment, uh, uh, education.
767 00:42:46.305 --> 00:42:47.885 So you can actually look inside these,
768 00:42:47.885 --> 00:42:50.605 these methods really easily, um,
769 00:42:50.945 --> 00:42:54.605 and find out like, what, how have we actually, um, sort
770 00:42:54.605 --> 00:42:58.565 of like formatted the prompt to perform these tasks and,
771 00:42:58.585 --> 00:42:59.685 and to do that successfully.
772 00:43:00.465 --> 00:43:01.965 So you can, I think it's always like, good
773 00:43:01.965 --> 00:43:05.365 to actually like read the prompt to understand like, what is
774 00:43:06.185 --> 00:43:07.525 the, the model, like
775 00:43:07.525 --> 00:43:09.285 what exactly is the model being instructed to do?
776 00:43:10.105 --> 00:43:11.285 So encourage you to like, look
777 00:43:11.285 --> 00:43:14.045 inside these generate gap queries, um,
778 00:43:14.105 --> 00:43:16.645 search chunks from vector, uh,
779 00:43:16.925 --> 00:43:18.165 generate subqueries, you know, et cetera.
780 00:43:18.905 --> 00:43:22.685 Um, so, um, uh, very quickly, so
781 00:43:22.685 --> 00:43:25.725 after it's done that it terminates, it then returns
782 00:43:25.745 --> 00:43:26.805 to this query function,
783 00:43:27.545 --> 00:43:31.205 and then now we're in these, uh, two steps here of
784 00:43:31.725 --> 00:43:36.125 synthesizing the report from all of these, um, subqueries
785 00:43:36.345 --> 00:43:37.965 and retrieve trunks.
786 00:43:39.385 --> 00:43:41.005 Um, and, you know, that's just like more
787 00:43:41.005 --> 00:43:42.205 prompting of the same model.
788 00:43:43.375 --> 00:43:47.555 And, uh, when you've done that, then, um, uh, so, you know,
789 00:43:47.555 --> 00:43:48.795 it may, may take like 10 minutes,
790 00:43:49.805 --> 00:43:53.025 30 minutes depending on like what inference service you use.
791 00:43:53.025 --> 00:43:56.225 And the question, it will have done like multiple iterations
792 00:43:56.405 --> 00:43:58.305 of this, um, like, you know, this reasoning
793 00:43:59.065 --> 00:44:01.865 breaking down the, the, the question into a number of like,
794 00:44:01.995 --> 00:44:05.625 steps to answer it, uh, working out, like whether it's,
795 00:44:05.765 --> 00:44:07.425 it should keep going or, or finish.
796 00:44:07.965 --> 00:44:09.865 And it generates a nice little report.
797 00:44:10.525 --> 00:44:12.145 And, um, I've got an example here.
798 00:44:12.765 --> 00:44:17.235 Um, so the, the question was how has the, the Simpsons,
799 00:44:17.455 --> 00:44:18.595 uh, evolved over time?
800 00:44:19.735 --> 00:44:22.155 And it's put together this nice little report sort
801 00:44:22.155 --> 00:44:27.075 of really like covering all bases, um, a nice sort
802 00:44:27.075 --> 00:44:28.475 of like conclusion tying things together.
803 00:44:29.575 --> 00:44:33.285 So, um, let's go back to the, um, the slides
804 00:44:33.585 --> 00:44:34.885 and we'll wrap things up.
805 00:44:38.505 --> 00:44:42.115 So could I give a rough overview of the prompts?
806 00:44:42.855 --> 00:44:45.915 Um, I think just for time, um, uh,
807 00:44:51.605 --> 00:44:52.455 yeah, let's have a look.
808 00:45:01.515 --> 00:45:04.045 Yeah, so, so, so for time, uh, I'm just gonna have, sorry,
809 00:45:04.385 --> 00:45:06.405 um, I think I'm gonna have to skip like looking at the
810 00:45:06.405 --> 00:45:08.205 exact, um, code.
811 00:45:08.665 --> 00:45:13.395 Uh, but let, let me just point you to, um, geez,
812 00:45:13.415 --> 00:45:15.635 now we bit lost my place.
813 00:45:22.065 --> 00:45:24.235 Okay. Yeah, so, so this is also in deep search,
814 00:45:24.375 --> 00:45:28.955 and we've got Subquery prompt And you can see, so yeah, so,
815 00:45:28.955 --> 00:45:31.555 so these prompts are actually in the deep search do pi file.
816 00:45:32.215 --> 00:45:35.035 So for example, you're an AI content analysis expert,
817 00:45:35.035 --> 00:45:37.475 good summarizing content, please summarize,
818 00:45:37.655 --> 00:45:38.715 you know, dah, dah, dah, dah.
819 00:45:39.625 --> 00:45:40.795 Then there's a refre, uh,
820 00:45:40.835 --> 00:45:43.875 a reflection prompt determine whether additional search
821 00:45:44.035 --> 00:45:46.355 queries are needed based on the original query, et cetera.
822 00:45:47.135 --> 00:45:48.635 Um, there's a re-ranking prompt
823 00:45:49.815 --> 00:45:51.235 and there's a subquery prompt.
824 00:45:51.535 --> 00:45:52.995 So you can see we've got, um,
825 00:45:53.095 --> 00:45:56.195 at least like four different types of, uh, uh, prompting
826 00:45:56.195 --> 00:45:58.115 for different, uh, subtasks.
827 00:45:59.015 --> 00:46:00.595 So, um, but, but check out this file
828 00:46:00.615 --> 00:46:02.675 and you can, you can sort of like, uh,
829 00:46:03.285 --> 00:46:04.595 check them out in some more detail.
830 00:46:05.135 --> 00:46:07.995 Uh, but going back to the slides, let's see.
831 00:46:15.785 --> 00:46:20.555 Okay, so, um, Uh, so what's sort
832 00:46:20.555 --> 00:46:23.835 of like some of the secret source, um, behind how,
833 00:46:23.835 --> 00:46:25.035 how these ations work?
834 00:46:25.975 --> 00:46:28.075 Uh, well, I think one thing is this idea
835 00:46:28.095 --> 00:46:29.555 of conditional computation.
836 00:46:30.095 --> 00:46:33.315 And so that means that the model can actually decide
837 00:46:33.695 --> 00:46:37.595 how much computation to do based on the current, sort
838 00:46:37.595 --> 00:46:40.195 of like, status of, um, the model output.
839 00:46:40.195 --> 00:46:42.275 And so this can be done in, in a number of different ways.
840 00:46:43.135 --> 00:46:46.435 Um, so one sort of, uh, you know,
841 00:46:46.435 --> 00:46:49.595 more complex way is you could actually introduce, uh, uh,
842 00:46:49.595 --> 00:46:53.555 like reasoning tokens that sort of tell the model to sort
843 00:46:53.555 --> 00:46:56.515 of like keep, you know, generating intermediate output,
844 00:46:56.845 --> 00:46:58.155 doing additional computations
845 00:46:58.155 --> 00:46:59.675 until some termination condition.
846 00:47:00.095 --> 00:47:02.515 Uh, apparently deep seek doesn't use this method.
847 00:47:03.175 --> 00:47:05.915 Um, but you know, this is like a good strategy to,
848 00:47:05.975 --> 00:47:07.275 to do conditional computation.
849 00:47:08.695 --> 00:47:10.635 Um, I think the second one is this
850 00:47:11.115 --> 00:47:12.195 reinforcement learning training.
851 00:47:12.415 --> 00:47:15.195 So, um, really simple just taking, uh,
852 00:47:15.195 --> 00:47:18.595 conceptually taking a strong base model, um,
853 00:47:18.815 --> 00:47:19.995 so like deep seek did
854 00:47:20.615 --> 00:47:23.195 and then applying, um, a form
855 00:47:23.195 --> 00:47:24.835 of reinforcement learning on
856 00:47:24.835 --> 00:47:26.035 very high quality reasoning data.
857 00:47:26.455 --> 00:47:28.395 And then there's sort of like this aha moment where
858 00:47:29.025 --> 00:47:30.965 the model just like starts to learn how to reason.
859 00:47:34.025 --> 00:47:38.645 Um, so, um, uh, it's not like all sort of like, um, uh,
860 00:47:38.645 --> 00:47:39.685 rainbows and sunshine.
861 00:47:40.195 --> 00:47:41.325 They're of course, like some,
862 00:47:41.485 --> 00:47:42.925 I think some quite major challenges with these.
863 00:47:42.925 --> 00:47:45.125 And I think the first one is just the cost.
864 00:47:45.785 --> 00:47:49.605 So if you're using Open AI's deep research agent, um,
865 00:47:49.625 --> 00:47:51.205 you'll have to have their pro subscription,
866 00:47:51.245 --> 00:47:52.605 I think it's like $200 a month,
867 00:47:53.325 --> 00:47:56.725 I believe they're still not covering the, the, the cost
868 00:47:56.725 --> 00:47:58.845 of like inference, uh, by charging that.
869 00:47:59.625 --> 00:48:01.325 Um, and so one thing I discovered, sort
870 00:48:01.325 --> 00:48:02.925 of like actually running these queries is
871 00:48:03.785 --> 00:48:05.725 how much inference they actually require,
872 00:48:05.725 --> 00:48:09.005 because firstly, these reasoning models, um,
873 00:48:09.075 --> 00:48:12.605 just typically use a lot more, uh, inference, um, you know,
874 00:48:12.605 --> 00:48:14.125 going through their like, number of reasoning steps
875 00:48:14.585 --> 00:48:16.045 for a given prompt, uh,
876 00:48:16.065 --> 00:48:19.045 but also the fact that it'll need to do multiple calls
877 00:48:19.065 --> 00:48:20.325 of these, um,
878 00:48:20.875 --> 00:48:23.565 many more calls than like a simple, uh, rack system.
879 00:48:25.345 --> 00:48:29.445 Um, again, just like hallucinations, like whole, um, uh,
880 00:48:29.445 --> 00:48:32.085 foundation models like a general problem, um,
881 00:48:32.115 --> 00:48:33.525 they can be reasoning errors.
882 00:48:33.745 --> 00:48:37.805 So in its intermediate, uh, reasoning chain trace it,
883 00:48:37.985 --> 00:48:40.325 if there's some sort of like incorrect, um, step,
884 00:48:40.955 --> 00:48:43.045 then all the following steps could fail.
885 00:48:43.945 --> 00:48:47.875 Um, and then I think finally, uh, to actually sort
886 00:48:47.875 --> 00:48:50.275 of train the, these reasoning models, we need
887 00:48:50.275 --> 00:48:52.395 to have really high quality, uh,
888 00:48:52.865 --> 00:48:54.915 open source reasoning, trace data sets.
889 00:48:55.455 --> 00:48:57.715 And so that's something that people are working on so
890 00:48:57.715 --> 00:49:01.475 that we can, uh, reproduce some of these results from,
891 00:49:01.475 --> 00:49:04.115 from open AI and, um, and its competitors.
892 00:49:05.745 --> 00:49:08.595 Okay. So I can tell, uh, ACHI is, um,
893 00:49:08.595 --> 00:49:10.155 hurrying me along so, so very, very quickly.
894 00:49:10.255 --> 00:49:13.155 So, uh, in terms of the cost, some sort
895 00:49:13.155 --> 00:49:14.835 of solutions people working on for
896 00:49:14.835 --> 00:49:17.155 that is specialized hardware, uh,
897 00:49:17.155 --> 00:49:18.955 but also other types of reasoning.
898 00:49:18.975 --> 00:49:23.115 So there's an idea called continuous chain of thought that,
899 00:49:23.615 --> 00:49:25.875 um, doesn't sort of like use discrete tokens to reason,
900 00:49:25.935 --> 00:49:28.555 but actually uses like a, a continuous latent variable,
901 00:49:28.685 --> 00:49:32.035 which is a lot, um, uh, um, more cost effective.
902 00:49:32.735 --> 00:49:35.555 Um, and then like these barriers to entry both
903 00:49:35.555 --> 00:49:37.155 with like the open source software
904 00:49:37.855 --> 00:49:39.915 and the, you know, like the data and the models.
905 00:49:40.695 --> 00:49:42.875 So, um, players like Ziot
906 00:49:42.895 --> 00:49:46.715 and hugging Face, uh, we are working on, uh,
907 00:49:46.955 --> 00:49:48.955 reproducing these results fully open source,
908 00:49:49.455 --> 00:49:50.915 so then new folks can just, you know,
909 00:49:50.915 --> 00:49:53.395 take away the learnings with, uh, systems that work
910 00:49:53.415 --> 00:49:55.875 and, um, build really easily build
911 00:49:55.875 --> 00:49:57.195 successful research agents.
912 00:49:58.535 --> 00:50:00.315 So, um, I think that's it from me
913 00:50:00.535 --> 00:50:03.635 and, um, uh, it looks like I went a bit over time,
914 00:50:03.635 --> 00:50:05.795 but I think we've got five minutes left for questions.
915 00:50:06.585 --> 00:50:08.435 Yeah. So you have a few questions,
916 00:50:08.495 --> 00:50:10.115 so let's just, uh, go through them.
917 00:50:10.935 --> 00:50:13.635 How, uh, how does the vector embedding work
918 00:50:13.635 --> 00:50:17.155 for images are vector vectors created for pixels
919 00:50:17.155 --> 00:50:18.195 or image portions?
920 00:50:21.835 --> 00:50:25.925 Yeah. Okay. So, um, Um,
921 00:50:26.805 --> 00:50:29.025 I, yeah.
922 00:50:29.025 --> 00:50:33.265 Okay. So, um, so how, so, uh, this sort of like framework
923 00:50:33.265 --> 00:50:35.545 that I presented is like very sort of, uh, general,
924 00:50:36.165 --> 00:50:39.185 all you need is some concept of embedding
925 00:50:39.565 --> 00:50:41.585 and some sort of like foundation model.
926 00:50:42.205 --> 00:50:46.585 Um, so typically, um, uh, so, um, uh,
927 00:50:47.555 --> 00:50:49.345 there are like really good open source models
928 00:50:49.575 --> 00:50:51.705 that can perform embedding of images,
929 00:50:52.485 --> 00:50:54.625 and they typically work on the whole image.
930 00:50:55.205 --> 00:50:58.665 Um, they might be looking at sort of like, uh, patches put
931 00:50:58.665 --> 00:51:00.225 that into like a vision transformer
932 00:51:00.685 --> 00:51:02.865 and then output like an embedding for the entire image.
933 00:51:03.805 --> 00:51:06.545 And we can, we can, there's models that will allow you
934 00:51:06.545 --> 00:51:10.305 to sort of like embed images into the same space as as text.
935 00:51:11.005 --> 00:51:14.385 So, um, uh, all of the same sort of concepts apply here.
936 00:51:14.805 --> 00:51:17.025 You just use a different embedding model that's specific
937 00:51:17.045 --> 00:51:18.545 for images or images and text.
938 00:51:21.405 --> 00:51:24.145 Um, so in the iterate search reasoning cycle
939 00:51:25.085 --> 00:51:27.305 is the iterated reasoning being performed
940 00:51:27.305 --> 00:51:29.705 by the reasoning LLM and search by the vector db.
941 00:51:30.325 --> 00:51:33.505 Uh, so the, um, so the LLM, it can never actually, it,
942 00:51:33.525 --> 00:51:36.105 it never actually performs an action, it just sort
943 00:51:36.105 --> 00:51:38.465 of gives the instruction to perform an action.
944 00:51:39.045 --> 00:51:43.105 So it will say, okay, search the vector database for,
945 00:51:43.165 --> 00:51:45.545 you know, this, and then the code will actually
946 00:51:45.545 --> 00:51:46.785 perform that, that search.
947 00:51:47.165 --> 00:51:50.185 But yeah, it's, it's the vector database that is performing
948 00:51:50.185 --> 00:51:51.305 that similarity search.
949 00:51:51.845 --> 00:51:54.825 The, the LM just sort of like requests an instance
950 00:51:54.845 --> 00:51:55.945 of, uh, tool usage.
951 00:51:57.205 --> 00:51:58.745 Um, doesn't need to be a reasoning model.
952 00:51:58.765 --> 00:51:59.785 So yeah, I think we've covered this.
953 00:51:59.885 --> 00:52:04.585 Um, so no, um, I think it's actually probably good if many
954 00:52:04.585 --> 00:52:06.425 of the other steps are, are not reasoning models
955 00:52:06.425 --> 00:52:10.185 because you can reduce the cost of, um, of, of inference
956 00:52:10.185 --> 00:52:14.945 to rank the system, um, is semantic search.
957 00:52:15.245 --> 00:52:19.685 Um, so, um,
958 00:52:20.595 --> 00:52:24.365 does, so, yeah, so, so Melva has support for,
959 00:52:24.625 --> 00:52:26.645 for hybrid, um, search.
960 00:52:27.185 --> 00:52:31.805 So lexical plus semantic search, uh, with Melva 2.5,
961 00:52:32.305 --> 00:52:33.325 you could implement that
962 00:52:33.865 --> 00:52:35.405 and that would, you know, just sort
963 00:52:35.405 --> 00:52:39.685 of be like a very small modification to that, um, uh,
964 00:52:39.685 --> 00:52:42.725 vector database lookup step in in the research agent.
965 00:52:44.345 --> 00:52:47.245 Can we choose the subquery number? Peram?
966 00:52:47.505 --> 00:52:49.965 Um, so I think this is one of those things
967 00:52:49.965 --> 00:52:51.485 where I think it's actually best.
968 00:52:51.825 --> 00:52:54.845 So we are sort of like designing the system to be like,
969 00:52:54.865 --> 00:52:57.045 as autonomous as possible.
970 00:52:57.985 --> 00:53:00.605 Um, you could, you know, you could sort of like hard code,
971 00:53:00.675 --> 00:53:02.645 like a maximum number.
972 00:53:02.945 --> 00:53:04.565 You could, I don't know, you could like re-rank
973 00:53:04.565 --> 00:53:06.045 them, take a maximum.
974 00:53:06.825 --> 00:53:09.845 Um, I think the simplest I implementation though just lets
975 00:53:09.845 --> 00:53:12.165 the foundation model, um, decide how many,
976 00:53:12.345 --> 00:53:13.765 um, uh, there should be.
977 00:53:14.185 --> 00:53:17.285 But um, yeah, so that's just like a design choice.
978 00:53:17.885 --> 00:53:19.205 I would just recommend letting the
979 00:53:19.205 --> 00:53:23.125 model actually decide that. Um, okay.
980 00:53:23.585 --> 00:53:25.405 You do have a few questions in the chat.
981 00:53:25.905 --> 00:53:29.925 Um, have you benchmarked this against the open AI solution?
982 00:53:30.025 --> 00:53:32.805 And does the framework also work, provide tools
983 00:53:32.945 --> 00:53:36.005 to do web crawling to collect relevant data for the query?
984 00:53:36.595 --> 00:53:39.325 Yeah, so great question. So, um, so I guess like the, the,
985 00:53:39.325 --> 00:53:42.325 the goal, so, um, uh, these different like open source
986 00:53:43.195 --> 00:53:46.245 deep research agents, um, they have, they've had different,
987 00:53:46.245 --> 00:53:47.245 like different goals.
988 00:53:47.745 --> 00:53:50.325 And so, uh, uh, the goal of ours was not so much
989 00:53:50.385 --> 00:53:54.365 to like reproduce the, um, specific like benchmark that, um,
990 00:53:54.585 --> 00:53:58.605 OpenAI, um, uh, ran theirs on,
991 00:53:58.985 --> 00:54:00.045 but to sort of like produce a
992 00:54:00.045 --> 00:54:01.125 system that's like understandable.
993 00:54:01.865 --> 00:54:04.005 We can use it for like, for teaching purposes,
994 00:54:04.465 --> 00:54:06.645 but I recommend, so check out the, um, the,
995 00:54:06.945 --> 00:54:09.565 the deep research agent from hugging face where
996 00:54:09.565 --> 00:54:13.445 that actually was one of their primary motivations was to,
997 00:54:13.985 --> 00:54:15.805 um, to achieve like a similar number
998 00:54:16.025 --> 00:54:18.325 or even exceed the benchmark, uh, uh, which they did.
999 00:54:19.605 --> 00:54:21.165 I think it's also just interesting to like,
1000 00:54:21.165 --> 00:54:24.245 compare different architectures for, for research agents.
1001 00:54:26.175 --> 00:54:28.225 Okay. Does this framework also provide tools
1002 00:54:28.285 --> 00:54:30.545 to do the web crawling to collect relevant data?
1003 00:54:31.325 --> 00:54:34.745 So, um, uh, yes, like, so we've got the, um, uh,
1004 00:54:34.745 --> 00:54:37.865 we've got like sort of the, the tools to, to call a number
1005 00:54:37.865 --> 00:54:40.065 of different web crawling, uh, services.
1006 00:54:40.845 --> 00:54:43.025 Um, I, I think we're still sort of like adding that
1007 00:54:43.045 --> 00:54:45.065 as like a dynamic, uh, tool call.
1008 00:54:45.645 --> 00:54:47.625 Uh, but I think that's something for the near future.
1009 00:54:48.165 --> 00:54:51.625 But um, you can just say, okay, here is like a domain name.
1010 00:54:51.785 --> 00:54:54.105 I want to, I wanna sort of, um, fetch all
1011 00:54:54.105 --> 00:54:57.865 of my data from this domain and then it'll call fire crawl
1012 00:54:57.925 --> 00:54:59.145 or whatever service you're using
1013 00:54:59.565 --> 00:55:03.265 and then pull that in, index that, and then run your query.
1014 00:55:05.805 --> 00:55:08.865 So, great question. And I think that brings us,
1015 00:55:08.925 --> 00:55:11.065 um, just up to about time. Yeah,
1016 00:55:11.295 --> 00:55:12.745 Yeah, right at the top of the hour.
1017 00:55:12.925 --> 00:55:15.905 So thank you guys. Thank you all so much for joining today.
1018 00:55:16.365 --> 00:55:19.425 Uh, Stefan's put his information on the screen here if
1019 00:55:19.425 --> 00:55:20.545 you have questions for him.
1020 00:55:20.925 --> 00:55:22.585 Uh, we also have office hours.
1021 00:55:22.965 --> 00:55:25.345 Um, if you want to, um,
1022 00:55:25.645 --> 00:55:28.625 if you want a specialized one-on-one session, uh,
1023 00:55:28.885 --> 00:55:30.425 the QR code for that is right here.
1024 00:55:30.965 --> 00:55:33.785 Um, and we also have a workshop coming up in
1025 00:55:33.785 --> 00:55:34.865 person in Palo Alto.
1026 00:55:35.125 --> 00:55:37.865 Uh, if you are based in the Bay Area, which I did see some
1027 00:55:37.865 --> 00:55:41.145 of you are so with opening, you register for that.
1028 00:55:41.865 --> 00:55:43.425 Um, so thank you all for joining today
1029 00:55:43.685 --> 00:55:46.505 and, uh, we look forward to seeing you at our next webinar.
1030 00:55:47.055 --> 00:55:48.105 Have a good rest of your day.
1031 00:55:48.365 --> 00:55:49.425 Thanks everyone for coming
1032 00:55:49.485 --> 00:55:52.225 and hope to see you, uh, in March for our,
1033 00:55:52.225 --> 00:55:53.465 for our workshop with OpenAI.
1034 00:55:53.655 --> 00:55:54.305 Okay. Take care.