Events
What Makes "Deep Research"? A Dive into AI Agents

Training

What Makes "Deep Research"? A Dive into AI Agents

Zilliz Webinar | Zoom

Join the Webinar

About this webinar:

Unless you live under a rock, you will have heard about OpenAI’s release of Deep Research on Feb 2, 2025. This new product promises to revolutionize how we answer questions requiring the synthesis of large amounts of diverse information. But how does this technology work, and why is Deep Research a noticeable improvement over previous attempts? In this webinar, we will examine the concepts underpinning modern agents using our basic clone, Deep Searcher, as an example.

deepsearcher_architecture_088c7066d1 (1).png

Topics covered:

Tool use
Structured output
Reflection
Reasoning models
Planning
Types of agentic memory

View presentation slides

Transcript

WEBVTT

1 00:00:03.565 --> 00:00:05.555 Today I'm pleased to introduce today's session,

2 00:00:05.825 --> 00:00:08.955 what makes Deep research a dive into AI agents

3 00:00:09.175 --> 00:00:10.875 and our guest speaker Stefan Webb.

4 00:00:11.415 --> 00:00:15.275 Stefan is a developer advocate at VIIs, where he advocates

5 00:00:15.275 --> 00:00:17.355 for the open source vector database, no list.

6 00:00:17.805 --> 00:00:20.475 Prior to this, he spent three years in the industry

7 00:00:20.475 --> 00:00:23.275 as an applied ML researcher at Twitter

8 00:00:23.335 --> 00:00:25.715 and meta collaborating with product teams

9 00:00:25.775 --> 00:00:27.755 to tackle their most complex challenges.

10 00:00:28.305 --> 00:00:31.195 Stephan holds a PhD from the University of Oxford,

11 00:00:31.455 --> 00:00:33.035 and he has published papers

12 00:00:33.255 --> 00:00:36.835 and leading, um, leading machine learning conferences such

13 00:00:36.835 --> 00:00:39.195 as nres, ICLR, and ICML.

14 00:00:39.655 --> 00:00:41.635 He is passionate about generative ai

15 00:00:41.815 --> 00:00:44.435 and it's eager to leverage his deep technical expertise

16 00:00:44.495 --> 00:00:46.435 to contribute to the open source community.

17 00:00:46.775 --> 00:00:47.915 Uh, welcome, Stefan.

18 00:00:48.455 --> 00:00:50.755 Thanks so much, Sachi. Thanks for the kind introduction.

19 00:00:51.135 --> 00:00:53.955 And you're right, I, I'm very passionate about generative AI

20 00:00:54.455 --> 00:00:56.835 and also passionate about helping developers.

21 00:00:57.455 --> 00:01:02.155 So, uh, really love, uh, doing webinars like this, uh,

22 00:01:02.155 --> 00:01:03.395 meeting some of our users

23 00:01:03.655 --> 00:01:06.035 and, you know, people just interested in, uh,

24 00:01:06.035 --> 00:01:07.515 their databases and ve ai.

25 00:01:08.615 --> 00:01:10.435 So, um, uh, just like a tiny bit more

26 00:01:10.435 --> 00:01:11.635 about myself before I get started.

27 00:01:12.255 --> 00:01:17.075 So I am what's called the, uh, developer advocate for zille,

28 00:01:17.575 --> 00:01:19.795 the company behind the leading open source

29 00:01:20.335 --> 00:01:22.355 vector database, uh, viss.

30 00:01:23.055 --> 00:01:26.155 And so as a developer advocate, a service like a,

31 00:01:26.155 --> 00:01:30.475 like a bridge between developers, the, um, the, the users

32 00:01:30.815 --> 00:01:33.915 of viss and the, um, the, um, the developers.

33 00:01:33.935 --> 00:01:38.515 So providing technical support to users, um,

34 00:01:38.585 --> 00:01:43.515 helping connect users with, um, uh, engineers for, um,

35 00:01:43.615 --> 00:01:44.955 you know, deeper technical support.

36 00:01:45.705 --> 00:01:48.755 Also running a lot of, um, uh, events

37 00:01:49.385 --> 00:01:50.955 like, uh, these webinars.

38 00:01:51.095 --> 00:01:55.195 We do a monthly, uh, meetup in the Bay Area, um,

39 00:01:55.855 --> 00:01:58.435 and, um, you know, producing some, like,

40 00:01:58.665 --> 00:02:00.075 some written content as well.

41 00:02:01.495 --> 00:02:03.915 So, um, I've put my LinkedIn there.

42 00:02:04.135 --> 00:02:07.155 Uh, I always love connecting with, with, with, uh,

43 00:02:07.225 --> 00:02:08.515 with, um, new folks.

44 00:02:09.315 --> 00:02:10.875 I love hearing like what, what you're building

45 00:02:10.875 --> 00:02:13.795 with generative ai, hearing, like what your sort

46 00:02:13.795 --> 00:02:14.875 of like challenges are

47 00:02:15.015 --> 00:02:18.235 and what your, your, um, your, uh, your visions are.

48 00:02:18.895 --> 00:02:20.235 That's like my, my bread and butter.

49 00:02:20.455 --> 00:02:23.395 So, uh, please, uh, connect with me on LinkedIn.

50 00:02:23.475 --> 00:02:24.835 I would love to, to hear from you.

51 00:02:25.015 --> 00:02:27.355 And, um, you know, maybe like if,

52 00:02:27.735 --> 00:02:28.955 if you're building like a rag

53 00:02:29.055 --> 00:02:31.875 or an agent system with your, your startup

54 00:02:31.935 --> 00:02:35.035 or your company, I think there's a really good opportunity

55 00:02:35.215 --> 00:02:39.515 for, um, for, um, a developer advocate at a company like Zel

56 00:02:39.695 --> 00:02:40.755 to, to sort of help

57 00:02:40.815 --> 00:02:45.395 and, um, provide some, provide some, um, some consultation.

58 00:02:46.175 --> 00:02:50.195 So with that, let's get started with the, the webinar.

59 00:02:51.255 --> 00:02:55.555 So the, the topic for today is what makes deep research,

60 00:02:56.375 --> 00:02:59.075 and I've subtitled it, I dive into AI agents.

61 00:02:59.745 --> 00:03:02.565 So we're gonna be talking about research agents

62 00:03:03.205 --> 00:03:04.725 specifically, uh,

63 00:03:04.745 --> 00:03:06.845 but I think a lot of this sort of also relates

64 00:03:07.465 --> 00:03:09.925 to generative AI agents in general.

65 00:03:12.065 --> 00:03:15.165 So I will start off, I'll, um, just to like, give a tiny bit

66 00:03:15.165 --> 00:03:19.845 of background to, um, open AI's deep research release,

67 00:03:20.785 --> 00:03:24.485 and then I'm going to introduce a, a research agent

68 00:03:24.755 --> 00:03:29.365 that is open source inspired by that, um, produced by

69 00:03:30.085 --> 00:03:32.685 engineers at Zillows and fully open sourced.

70 00:03:33.745 --> 00:03:37.285 So, um, uh, I, I say demo, it's more of like a,

71 00:03:37.285 --> 00:03:38.405 like a code walkthrough.

72 00:03:38.875 --> 00:03:40.965 I'll sort of like explain how it was put together.

73 00:03:42.475 --> 00:03:45.845 Then after that I'll talk a bit about some of the ideas

74 00:03:46.505 --> 00:03:50.765 behind, um, agents in general, uh, but also research agents

75 00:03:51.305 --> 00:03:52.685 and what's kind of like new

76 00:03:52.865 --> 00:03:56.005 and why is deep research sort

77 00:03:56.005 --> 00:03:58.485 of come on the scene, uh, so recently.

78 00:03:59.465 --> 00:04:02.325 And then with, with, um, you know, with that discussion,

79 00:04:02.865 --> 00:04:05.125 it should be like clear, like what some of the,

80 00:04:05.125 --> 00:04:09.645 the challenges and, uh, yeah,

81 00:04:09.805 --> 00:04:11.765 I guess like, like challenges and obstacles to,

82 00:04:11.945 --> 00:04:13.165 to wider adoption.

83 00:04:13.345 --> 00:04:15.285 So we'll talk about some of those

84 00:04:15.985 --> 00:04:17.645 and some potential solutions.

85 00:04:17.705 --> 00:04:21.565 So sort of give you a, uh, gives you, gives you a sense of

86 00:04:22.015 --> 00:04:25.125 where things are headed over the short term,

87 00:04:25.195 --> 00:04:27.085 next six months, six, 12 months, et cetera.

88 00:04:27.945 --> 00:04:29.045 So with

89 00:04:29.045 --> 00:04:33.925 that, Let's get started.

90 00:04:34.105 --> 00:04:38.005 And, um, so by the way, uh, feel free to ask questions

91 00:04:38.705 --> 00:04:41.085 in the chat as, as they occur to you.

92 00:04:42.065 --> 00:04:44.165 And, um, I'll just kind of like, I'll, I'll stop

93 00:04:44.365 --> 00:04:46.325 whenever a question comes in, um, or,

94 00:04:46.325 --> 00:04:48.605 or try my best to do so and, and take them as they come in.

95 00:04:49.625 --> 00:04:51.725 So, um, okay.

96 00:04:52.385 --> 00:04:56.125 So, um, uh, I'm sure like everyone here has heard about

97 00:04:56.835 --> 00:05:00.925 open ai, uh, one of their, their new product releases, uh,

98 00:05:00.955 --> 00:05:05.325 deep research, which was released at the very, um,

99 00:05:05.665 --> 00:05:09.365 or, uh, near the start of February last month.

100 00:05:10.465 --> 00:05:13.085 And so this is, um, it's a bit of a different product

101 00:05:13.465 --> 00:05:18.245 to their sort of like straight, um, chatbot in that,

102 00:05:18.865 --> 00:05:23.365 um, uh, so it is able to, to go off, uh,

103 00:05:23.425 --> 00:05:26.925 search the web, do other, use other sort of like tools

104 00:05:27.465 --> 00:05:31.965 to build, um, a really detailed report to your question.

105 00:05:32.905 --> 00:05:33.965 And so you can give it.

106 00:05:34.225 --> 00:05:36.125 So, um, I've just taken a screenshot here.

107 00:05:37.225 --> 00:05:39.205 Um, and so, um, this is an example.

108 00:05:39.865 --> 00:05:44.645 The question in, in this case might have been, um, uh,

109 00:05:44.705 --> 00:05:48.365 please research, um, freestyle snowboards suitable

110 00:05:49.105 --> 00:05:50.685 for an intermediate rider with,

111 00:05:51.225 --> 00:05:55.965 and then the user's given, uh, some details, their height,

112 00:05:55.965 --> 00:05:57.645 their weight, shoe size, et cetera.

113 00:05:58.705 --> 00:06:03.245 So, um, then this, uh, this agent, um, goes off,

114 00:06:03.985 --> 00:06:07.565 uh, uses, so searches the web, um,

115 00:06:08.625 --> 00:06:13.165 and, uh, is able to sort of, uh, work, um, yeah, sort

116 00:06:13.165 --> 00:06:16.125 of like work out how to answer this question.

117 00:06:16.905 --> 00:06:21.725 Um, sort of like iterating from, uh, one step to another.

118 00:06:22.665 --> 00:06:25.485 And then after, um, some time could be like eight minutes,

119 00:06:25.485 --> 00:06:29.565 could be 30 minutes, uh, synthesize like a really detailed

120 00:06:30.145 --> 00:06:32.645 and coherent informed report.

121 00:06:33.745 --> 00:06:36.525 And so, so this is like much different from the sort

122 00:06:36.525 --> 00:06:40.725 of plain old, uh, chat GPT that is just like,

123 00:06:41.945 --> 00:06:44.765 um, you know, like, like returning you an answer more

124 00:06:44.765 --> 00:06:46.965 or less in, in real time, rather than sort of like going off

125 00:06:47.425 --> 00:06:50.325 and, uh, going through like a lot of, uh, autonomous steps.

126 00:06:53.065 --> 00:06:57.045 So, um, I think, excuse me, uh,

127 00:06:57.865 --> 00:07:01.045 why this sort of, I think sort of like exploded in the, um,

128 00:07:01.045 --> 00:07:02.885 the media was because, uh,

129 00:07:02.885 --> 00:07:05.565 people were really impressed by the results.

130 00:07:06.505 --> 00:07:11.365 It seemed to do a very good job of actually researching,

131 00:07:11.785 --> 00:07:15.205 uh, a topic that might require not just like a plain answer,

132 00:07:15.265 --> 00:07:17.085 but might actually require going off,

133 00:07:17.355 --> 00:07:18.805 looking at multiple sources,

134 00:07:21.885 --> 00:07:23.105 Asking further questions.

135 00:07:24.045 --> 00:07:26.945 And I think I read somewhere, um, you know, one sort of, uh,

136 00:07:26.975 --> 00:07:29.905 professor was like, this sort of, you know,

137 00:07:29.905 --> 00:07:32.265 could replace like a, um,

138 00:07:32.535 --> 00:07:35.865 like a early stage PhD student in terms of, uh, doing some,

139 00:07:35.895 --> 00:07:38.145 some research and, um,

140 00:07:38.145 --> 00:07:39.225 other professionals were just

141 00:07:39.225 --> 00:07:40.265 like really impressed with the result.

142 00:07:40.725 --> 00:07:42.825 Um, but what exactly was new about it?

143 00:07:42.895 --> 00:07:45.865 Well, it wasn't the first research agent

144 00:07:46.385 --> 00:07:50.105 released commercially, so Google's, uh, deep research

145 00:07:50.685 --> 00:07:53.985 was released about a month earlier in, in December.

146 00:07:56.045 --> 00:07:58.945 Um, so, um, what exactly was, was new about it?

147 00:07:58.945 --> 00:08:01.945 Like what, why, why did it sort of, um, what was it about it

148 00:08:01.945 --> 00:08:05.745 that had this really much superior, um, output,

149 00:08:06.005 --> 00:08:07.025 uh, qualitatively?

150 00:08:08.485 --> 00:08:11.825 And, um, I think the answer to that is, uh,

151 00:08:12.235 --> 00:08:14.745 we're not really sure because it's, it's closed source.

152 00:08:15.375 --> 00:08:17.625 It's sort of like tightly guarded secret, the design.

153 00:08:18.365 --> 00:08:21.945 But, um, from like, you know, the, uh, like the sort

154 00:08:21.945 --> 00:08:23.065 of rumor mill, it's,

155 00:08:23.385 --> 00:08:25.905 I suppose it's like people speaking to insiders.

156 00:08:26.445 --> 00:08:29.065 Uh, plus also like the, um, the, the, the blog

157 00:08:29.065 --> 00:08:32.625 and announcement that OpenAI released, it seems like a big,

158 00:08:32.825 --> 00:08:36.305 a big sort of, um, element of that was, um,

159 00:08:36.455 --> 00:08:40.425 that it focuses on, uh, like a, uh,

160 00:08:41.405 --> 00:08:43.625 uh, like an end to end, uh, training

161 00:08:43.625 --> 00:08:47.545 with reinforcement learning on really high quality, uh, uh,

162 00:08:47.575 --> 00:08:51.265 reasoning trace data, which we'll discuss more in a minute.

163 00:08:51.765 --> 00:08:53.025 Um, but again,

164 00:08:53.305 --> 00:08:55.025 possibly there's other things in there in the design.

165 00:08:55.625 --> 00:08:57.285 Uh, we just dunno 'cause it's close source,

166 00:08:58.065 --> 00:09:00.765 but we can kind of like guess what they, they are by trying

167 00:09:00.785 --> 00:09:03.285 to like, um, reproduce a system

168 00:09:03.395 --> 00:09:07.725 that can achieve similar results on, on the, um, uh, the,

169 00:09:07.945 --> 00:09:09.405 um, uh, benchmarks that we're using.

170 00:09:10.265 --> 00:09:14.685 And so, so one such, uh, one such model is, uh,

171 00:09:14.995 --> 00:09:17.125 from, uh, from deep seek.

172 00:09:17.505 --> 00:09:19.645 So deep seek r run, uh, R one.

173 00:09:19.645 --> 00:09:21.645 We'll talk about that a bit later on.

174 00:09:25.405 --> 00:09:28.135 Okay. So, uh, what exactly is a research agent

175 00:09:28.555 --> 00:09:33.375 and how does a research agent differ from just like a,

176 00:09:33.595 --> 00:09:36.655 you know, how you, uh, an agent in, in the general sense?

177 00:09:37.595 --> 00:09:40.255 And I think it's like one of those things in generative ai,

178 00:09:40.835 --> 00:09:44.775 it hasn't, people disagree on the definitions so far.

179 00:09:45.395 --> 00:09:47.015 Um, we're still sort of like coalescing

180 00:09:47.015 --> 00:09:49.055 around an exact, uh, definition.

181 00:09:50.515 --> 00:09:54.735 But, um, uh, uh, my definition, which I think overlaps with

182 00:09:54.735 --> 00:09:59.135 with many people is it's an agent that, so the, um, the,

183 00:09:59.135 --> 00:10:02.375 the goal is to, to, to, um,

184 00:10:02.595 --> 00:10:06.015 to do research in the sense that it has to go off

185 00:10:06.555 --> 00:10:10.255 and discover, uh, many, many relevant sources.

186 00:10:10.635 --> 00:10:14.095 So it is not just like doing a single lookup to, um,

187 00:10:14.295 --> 00:10:16.455 a vector database or, you know,

188 00:10:16.455 --> 00:10:19.495 it's not just accessing like a single Wikipedia page,

189 00:10:20.205 --> 00:10:23.775 it's pulling in, uh, it's, it's, um, uh,

190 00:10:23.775 --> 00:10:28.695 making a decision about, uh, various sources to, to search

191 00:10:30.275 --> 00:10:32.215 and then, um, uh,

192 00:10:32.545 --> 00:10:35.015 break the question down into to multiple steps.

193 00:10:35.995 --> 00:10:40.295 Um, and, uh, sort of, uh, have, uh,

194 00:10:40.395 --> 00:10:42.415 or, uh, autonomously sort of like a reason

195 00:10:42.415 --> 00:10:45.775 through answering the question and then synthesize like a,

196 00:10:46.135 --> 00:10:49.015 a detailed report, um, at the end.

197 00:10:49.955 --> 00:10:52.655 And so, um, I got some, like, some quotes here from the,

198 00:10:53.115 --> 00:10:55.575 the, the deep research release blog.

199 00:10:56.275 --> 00:10:58.615 And I sort of like saw like three themes.

200 00:10:59.355 --> 00:11:03.735 So we've got iteration, uh, we've got, um, search

201 00:11:04.115 --> 00:11:07.295 or, um, uh, I guess we talk like, like tool usage.

202 00:11:07.835 --> 00:11:10.655 And then the third is, uh, reasoning.

203 00:11:11.835 --> 00:11:16.055 So under the, the topic of iteration, so the, the, um,

204 00:11:16.325 --> 00:11:20.335 deep research release it, uh, blog posts, it mentioned, it,

205 00:11:20.355 --> 00:11:24.775 it had, um, uh, describe things like learn to plan,

206 00:11:25.005 --> 00:11:29.015 execute a multi-step trajectory, also backtracking

207 00:11:29.115 --> 00:11:31.815 and reacting to real time information.

208 00:11:33.115 --> 00:11:36.655 So this is obviously like describing like a, um, uh,

209 00:11:36.655 --> 00:11:40.095 like an agent that is able to sort of know what

210 00:11:40.095 --> 00:11:43.535 to do next autonomously, um, also, uh, pivoting

211 00:11:43.555 --> 00:11:46.015 as needed in reaction to information it encounters.

212 00:11:47.635 --> 00:11:48.935 Um, so a second thing,

213 00:11:48.935 --> 00:11:51.175 and I think these are sort of like, there's a, you know,

214 00:11:51.245 --> 00:11:52.415 overlap between these three.

215 00:11:52.875 --> 00:11:56.565 Uh, but under the, the topic of search, the,

216 00:11:56.565 --> 00:12:00.285 the blog post contained things like train end-to-end

217 00:12:00.885 --> 00:12:02.805 reinforcement learning on hard browsing

218 00:12:02.805 --> 00:12:05.485 and reasoning tasks across a range of domains.

219 00:12:06.025 --> 00:12:08.045 And I think it's generally sort of like, suppose

220 00:12:08.045 --> 00:12:10.005 that this is kind of like the main, uh,

221 00:12:10.005 --> 00:12:14.125 secret source ingredient, um, also optimized

222 00:12:14.125 --> 00:12:16.285 for web browsing and data analysis.

223 00:12:17.985 --> 00:12:22.565 And then, um, uh, on, on the, the third theme, which is, uh,

224 00:12:22.695 --> 00:12:25.605 which overlaps with iteration search is, uh, reasoning.

225 00:12:26.305 --> 00:12:28.685 So fine tuned on the upcoming

226 00:12:29.275 --> 00:12:31.005 open AI oh three reasoning model.

227 00:12:31.505 --> 00:12:33.845 Um, and it leverages reasoning to search, interpret,

228 00:12:33.905 --> 00:12:37.645 and analyze massive amounts of, uh, text.

229 00:12:39.025 --> 00:12:41.445 So we can sort of like, uh, uh, sort

230 00:12:41.445 --> 00:12:44.565 of like piece from this, this, um, uh, uh,

231 00:12:44.915 --> 00:12:47.365 blog post release, how it might work,

232 00:12:47.945 --> 00:12:49.805 and relate that to like the,

233 00:12:49.805 --> 00:12:51.765 the latest developments happening in generative ai

234 00:12:52.305 --> 00:12:53.845 and, um, try

235 00:12:53.845 --> 00:12:58.045 and sort of like, uh, uh, reproduce the results

236 00:12:58.505 --> 00:12:59.605 by building our own system.

237 00:13:05.195 --> 00:13:06.575 And, um, that's exactly what we did.

238 00:13:06.575 --> 00:13:10.295 So, uh, we were very excited by the release, um, given the,

239 00:13:10.955 --> 00:13:15.335 uh, you know, we sort of like saw the, the qualitative, um,

240 00:13:16.115 --> 00:13:17.975 uh, quality of, of the output.

241 00:13:18.755 --> 00:13:23.135 And so we were really curious, uh, being a, you know,

242 00:13:23.175 --> 00:13:26.215 a vector database company, vector databases being one

243 00:13:26.375 --> 00:13:29.375 of the core components, powering, um, agents.

244 00:13:30.035 --> 00:13:31.335 Uh, we were really curious, like,

245 00:13:31.335 --> 00:13:35.975 could we build our own open source version to, to, um,

246 00:13:35.975 --> 00:13:37.015 to, to work similarly.

247 00:13:37.675 --> 00:13:39.375 And, um, that's what we did about a month ago.

248 00:13:39.755 --> 00:13:42.575 Um, some engineers built, uh, an open source

249 00:13:43.335 --> 00:13:44.935 software called Deep Searcher.

250 00:13:47.965 --> 00:13:50.705 And so, so like deep research, you give it a query,

251 00:13:51.325 --> 00:13:55.265 it then goes off, um, searches through multiple sources, um,

252 00:13:55.295 --> 00:13:58.745 sort of like iterates, uh, like, uh, breaks down the,

253 00:13:58.765 --> 00:14:01.705 the question into, um, steps that can iterate over,

254 00:14:02.175 --> 00:14:06.185 finally makes a decision about when to, to, to stop,

255 00:14:06.525 --> 00:14:07.825 um, answering the question.

256 00:14:08.765 --> 00:14:11.305 And, um, then synthesizes like a detailed

257 00:14:11.525 --> 00:14:12.865 report from all that information.

258 00:14:16.535 --> 00:14:19.595 And, um, so this, um, uh, this research agent,

259 00:14:20.105 --> 00:14:24.435 it's built on top of the Vector database, um, vis, uh,

260 00:14:24.485 --> 00:14:27.515 which is, um, so, so zits, we are the main contributors.

261 00:14:28.095 --> 00:14:31.195 Uh, it's been donated to the, the Linux Foundation for AI

262 00:14:31.295 --> 00:14:34.675 and, and data, um, uh, since, um, uh,

263 00:14:35.005 --> 00:14:36.395 since I think, uh, 2020.

264 00:14:37.135 --> 00:14:39.635 And so, uh, let just say a few words about VIS

265 00:14:39.635 --> 00:14:43.595 before we go into the code of Deep searcher.

266 00:14:44.735 --> 00:14:49.275 So, um, so VIS is fully open source, um, Apache,

267 00:14:49.535 --> 00:14:52.125 um, library, so suitable for commercial use

268 00:14:53.105 --> 00:14:54.925 and, um, very simple to use.

269 00:14:55.225 --> 00:14:58.605 You can pip install the, the light version on,

270 00:14:58.625 --> 00:15:03.205 on your notebook, um, a much sort of more, um, uh,

271 00:15:03.445 --> 00:15:05.165 scalable version you can launch in a

272 00:15:05.165 --> 00:15:06.405 docket image really easily.

273 00:15:07.625 --> 00:15:12.325 And, um, then we have like a third version,

274 00:15:12.415 --> 00:15:15.325 which is the, the fully distributed version mils cluster

275 00:15:16.475 --> 00:15:19.605 that, um, uh, you know, like you, you launch on a cluster

276 00:15:19.605 --> 00:15:20.885 of machines via Kubernetes

277 00:15:21.385 --> 00:15:25.125 and can scale to like literally the, um, uh, tens

278 00:15:25.125 --> 00:15:27.085 to hundreds civilians of, of vectors.

279 00:15:31.985 --> 00:15:36.925 So, um, easier set up, um, has really good integration

280 00:15:37.715 --> 00:15:41.205 into the, the generative AI tooling ecosystem.

281 00:15:42.905 --> 00:15:46.725 So because it's open source, uh, we have a lot of, uh,

282 00:15:47.045 --> 00:15:50.205 contributions from, you know, pretty much all of like the,

283 00:15:50.205 --> 00:15:53.085 the, the big tools in generative ai.

284 00:15:53.905 --> 00:15:57.685 So whether that's like hugging face open ai, l chain, Gina,

285 00:15:58.185 --> 00:16:02.765 um, air byte, um, there's, um, I would say like, uh, like,

286 00:16:03.225 --> 00:16:05.325 you know, dozens and dozens of these integrations,

287 00:16:05.425 --> 00:16:09.485 so you'll be able to use it within your existing,

288 00:16:09.705 --> 00:16:10.965 uh, tool set most likely.

289 00:16:14.025 --> 00:16:17.085 And I think like a strong sort of like, um, I guess sort

290 00:16:17.085 --> 00:16:21.205 of like, uh, recommendation for, for its, uh, performance

291 00:16:21.425 --> 00:16:24.605 and reliability is the fact that it's used by a lot of,

292 00:16:24.625 --> 00:16:25.765 of really big companies.

293 00:16:26.345 --> 00:16:30.645 So everyone from nvidia, Microsoft, um, Salesforce,

294 00:16:31.785 --> 00:16:33.125 uh, Ikea and so on.

295 00:16:37.145 --> 00:16:41.485 And so, um, uh, just like very briefly mentioned, so, um,

296 00:16:41.605 --> 00:16:45.805 I think like a big sort of use for, uh, vector databases is,

297 00:16:45.985 --> 00:16:49.845 uh, retrieval augmented generation, as well as, uh,

298 00:16:49.845 --> 00:16:51.805 what we are now calling agents, which is sort

299 00:16:51.805 --> 00:16:53.365 of like extensions to this framework.

300 00:16:54.065 --> 00:16:55.925 And, um, so, so just to like, make it clear like

301 00:16:55.925 --> 00:16:57.045 where the vector database fits in,

302 00:16:57.935 --> 00:17:01.025 I've got like a schematic here of a rag pipeline.

303 00:17:02.045 --> 00:17:04.825 And so you start off with a knowledge base of things

304 00:17:04.825 --> 00:17:05.865 that you wanna search over.

305 00:17:06.245 --> 00:17:09.545 Oh, and so, so by the way, so, um, our sort of like, uh,

306 00:17:09.785 --> 00:17:13.905 research agent pipeline will be like an extension of, uh,

307 00:17:14.065 --> 00:17:15.385 a basic rag pipeline.

308 00:17:16.605 --> 00:17:18.985 Uh, so we've got a knowledge base that we wanna search over.

309 00:17:19.085 --> 00:17:21.425 So this might be like your internal company documents.

310 00:17:22.205 --> 00:17:25.865 It might be like, um, uh, images from customers

311 00:17:25.965 --> 00:17:27.585 or like videos that people uploaded.

312 00:17:29.005 --> 00:17:32.185 You then put that through your embedding deep your network

313 00:17:32.765 --> 00:17:34.465 to produce these vector embeddings,

314 00:17:34.925 --> 00:17:37.825 and then you store that in, uh, vis.

315 00:17:38.945 --> 00:17:43.005 And so, so Mils, um, then provides a really convenient, um,

316 00:17:43.145 --> 00:17:47.085 and efficient interface for performing a similarity search

317 00:17:47.625 --> 00:17:50.605 or, um, uh, essentially a semantic search.

318 00:17:51.265 --> 00:17:52.685 So in a rag chatbot,

319 00:17:52.705 --> 00:17:54.605 the user then comes along with their question.

320 00:17:56.145 --> 00:17:58.685 Um, so this question gets put

321 00:17:58.685 --> 00:18:00.405 through typically the same embedding model,

322 00:18:01.345 --> 00:18:04.885 and then we search for similar vectors to the query vector

323 00:18:05.625 --> 00:18:06.965 in our vector database

324 00:18:07.705 --> 00:18:11.685 and retrieve, uh, vectors that are close that correspond

325 00:18:11.685 --> 00:18:13.285 to items in our knowledge base.

326 00:18:14.265 --> 00:18:16.205 And because of the way that these models work,

327 00:18:16.775 --> 00:18:19.845 those ones will be semantically similar to our query.

328 00:18:20.545 --> 00:18:24.045 And so in other words, they'll contain relevant information

329 00:18:24.665 --> 00:18:26.805 to the, um, the query being answered.

330 00:18:27.825 --> 00:18:29.605 So then the idea of rag very simple.

331 00:18:30.265 --> 00:18:34.525 We just put those into the context of the prompt that we put

332 00:18:34.555 --> 00:18:37.405 that we, uh, run the large language model on,

333 00:18:37.425 --> 00:18:38.885 or large language vision model

334 00:18:38.945 --> 00:18:40.725 or whatever foundation model, um, you're using.

335 00:18:41.625 --> 00:18:45.565 So we augment the user's question with

336 00:18:46.095 --> 00:18:49.325 these retrieved, uh, documents from the vector database,

337 00:18:50.305 --> 00:18:51.725 put them into a large language model.

338 00:18:52.425 --> 00:18:55.485 And then because the large language model has that context,

339 00:18:56.355 --> 00:18:59.885 it's able to give a much more, um, uh, reliable,

340 00:19:00.345 --> 00:19:01.445 um, up-to-date answer.

341 00:19:02.105 --> 00:19:03.925 So you can, you think about it as like,

342 00:19:04.275 --> 00:19:08.225 like an external memory for the, um, for, for your,

343 00:19:08.225 --> 00:19:09.745 your rag or your agent.

344 00:19:10.605 --> 00:19:13.545 So an external memory that you can like, update

345 00:19:13.605 --> 00:19:15.905 as new facts, uh, new data come in,

346 00:19:16.725 --> 00:19:19.585 and you don't have to retrain your, um,

347 00:19:20.015 --> 00:19:22.105 your foundation model, your large language model.

348 00:19:24.935 --> 00:19:26.875 So I've got two, uh, links here.

349 00:19:27.415 --> 00:19:30.595 Uh, I think these are some really good resources if you're

350 00:19:30.595 --> 00:19:32.275 getting started with, uh, with VIS

351 00:19:32.335 --> 00:19:36.795 or just building, um, generative AI applications

352 00:19:36.795 --> 00:19:38.115 of vector databases in general.

353 00:19:39.055 --> 00:19:42.315 So on the ride, I've got the GitHub to, to melvic,

354 00:19:42.335 --> 00:19:44.995 so you know, you've got instructions to download it, a link

355 00:19:44.995 --> 00:19:49.115 to the docs, a lot of really useful, um, tutorials.

356 00:19:50.055 --> 00:19:51.795 Um, and then on the right hand side,

357 00:19:51.865 --> 00:19:55.075 I've got our generative AI learning, uh, portal,

358 00:19:55.405 --> 00:19:59.275 which has a lot of really useful, uh, notebooks

359 00:19:59.945 --> 00:20:02.395 telling you like a sort of, um, uh, taking you

360 00:20:02.395 --> 00:20:03.795 through the steps of building much more

361 00:20:04.265 --> 00:20:06.035 substantive, um, applications.

362 00:20:06.735 --> 00:20:10.595 So really good like resource to, to learn how to build, um,

363 00:20:10.895 --> 00:20:14.195 you know, rag agents, recommended systems,

364 00:20:14.395 --> 00:20:16.275 semantic search and so on.

365 00:20:18.305 --> 00:20:21.205 Uh, but let's now turn to a code walkthrough

366 00:20:21.305 --> 00:20:25.445 of deep searcher to see how we actually, um,

367 00:20:25.745 --> 00:20:29.885 how we actually constructed this, uh, this research agent.

368 00:20:30.785 --> 00:20:32.525 And I think it helps before we actually go into the code

369 00:20:32.915 --> 00:20:36.845 just to have like a mental, uh, model of, of what it's,

370 00:20:36.845 --> 00:20:39.565 what it's actually doing, uh, so that we can sort of like,

371 00:20:39.585 --> 00:20:42.285 um, you know, scaffold out what, uh, I guess like, sort

372 00:20:42.285 --> 00:20:43.685 of keep that in mind as we're going

373 00:20:43.685 --> 00:20:45.085 through the code so that it makes sense.

374 00:20:46.185 --> 00:20:49.885 Um, so, um, similarly to a rag system,

375 00:20:50.475 --> 00:20:53.445 this research agent has two separate parts.

376 00:20:53.505 --> 00:20:57.445 The first is, uh, data ingestion, which happens, um,

377 00:20:58.465 --> 00:21:00.845 uh, in our case, uh, beforehand.

378 00:21:01.625 --> 00:21:05.165 So you tell it what internal documents crawled web pages,

379 00:21:05.485 --> 00:21:09.245 structured data, uh, streaming data, um, in theory that,

380 00:21:09.245 --> 00:21:10.525 that you want to, to search over,

381 00:21:11.145 --> 00:21:13.005 and that gets stored, that gets embedded

382 00:21:13.145 --> 00:21:15.485 and stored in, in, in vu the Vector database.

383 00:21:16.025 --> 00:21:18.565 Um, so I think in, in, in like a future version, um,

384 00:21:18.665 --> 00:21:21.085 or I think it's a feature we're adding, is this sort

385 00:21:21.085 --> 00:21:25.125 of like more dynamic, uh, search the web as as needed.

386 00:21:26.805 --> 00:21:29.185 Um, so then the, the other part,

387 00:21:29.285 --> 00:21:31.185 the main part is this online serving.

388 00:21:32.005 --> 00:21:34.865 So the user will come in with a query,

389 00:21:36.255 --> 00:21:40.105 then we use a large language model, um, in our case, um,

390 00:21:40.225 --> 00:21:43.305 a reasoning model to, to break down the question

391 00:21:43.935 --> 00:21:46.585 into a number of, uh, sub-questions

392 00:21:46.605 --> 00:21:51.055 or subqueries, um, which then, um, uh,

393 00:21:51.615 --> 00:21:54.775 a, a router sort of works out like which, uh, which sort

394 00:21:54.775 --> 00:21:58.695 of like data store to, to fetch relevant entries from, uh,

395 00:21:58.695 --> 00:22:00.535 which we then do from the vector database.

396 00:22:01.955 --> 00:22:03.615 Um, and then I think this is sort of like the, um,

397 00:22:03.765 --> 00:22:05.695 what makes it, uh, you know,

398 00:22:05.695 --> 00:22:08.295 you can call it an agent rather than just plain rag

399 00:22:09.035 --> 00:22:11.095 is it has this reflection step

400 00:22:11.605 --> 00:22:14.135 that decides what to do next.

401 00:22:15.075 --> 00:22:17.055 So the LLM says, uh,

402 00:22:17.075 --> 00:22:19.415 or the, the prompt asks it to answer the question,

403 00:22:20.515 --> 00:22:24.655 are there any gaps in the, um, the, the, the questions

404 00:22:25.005 --> 00:22:27.055 that have been, um, asked

405 00:22:27.075 --> 00:22:31.135 and answered so far using the information from the, um,

406 00:22:31.365 --> 00:22:33.255 from the, uh, data ingestion?

407 00:22:34.075 --> 00:22:38.695 And so if it says, um, yes, there are still gaps,

408 00:22:39.035 --> 00:22:40.815 uh, knowledge gaps to be answered,

409 00:22:41.565 --> 00:22:44.175 then it will generate new queries.

410 00:22:45.365 --> 00:22:47.105 Um, and then just go through the same process

411 00:22:47.325 --> 00:22:51.305 and sort of keep looping this until it's satisfied that, uh,

412 00:22:51.335 --> 00:22:54.905 it's either like exhausted a, uh, like, like a budget

413 00:22:54.965 --> 00:22:58.665 of iterations or tokens or more likely

414 00:22:58.805 --> 00:23:02.545 before then it's exhausted all of the, the questions

415 00:23:02.895 --> 00:23:06.625 that it believes, uh, it needs to answer to, um, to, to sort

416 00:23:06.625 --> 00:23:09.265 of cover the, the query and not have any knowledge gaps.

417 00:23:10.765 --> 00:23:14.825 So, um, this is what makes it, uh, we can call it an agent

418 00:23:15.365 --> 00:23:18.225 rather than just like a, uh, like a plain rag

419 00:23:18.925 --> 00:23:23.865 is the LLM is being used to like to route, um, the,

420 00:23:24.125 --> 00:23:25.145 um, execution.

421 00:23:27.065 --> 00:23:30.605 Um, and, um, I guess we can also, we can think of this, uh,

422 00:23:30.635 --> 00:23:34.005 calling the vector database in response to like

423 00:23:34.635 --> 00:23:36.085 dynamically generated queries.

424 00:23:36.085 --> 00:23:38.685 We can sort of think about that as like a, a form

425 00:23:38.685 --> 00:23:40.605 of tool usage as well.

426 00:23:41.785 --> 00:23:45.725 So we, we've got this, like this, um, conditional execution.

427 00:23:46.575 --> 00:23:49.365 We've got tool usage, um, two sort

428 00:23:49.365 --> 00:23:52.245 of like defining characteristics of, uh, being an agent.

429 00:23:53.985 --> 00:23:57.885 Um, and so then after that, then when it says, okay,

430 00:23:58.415 --> 00:24:02.115 there are no knowledge gaps, then it'll move on

431 00:24:02.115 --> 00:24:04.195 to the final step, which is

432 00:24:04.395 --> 00:24:06.875 to then use the large language model to,

433 00:24:07.015 --> 00:24:10.275 to join those answers from the sub-questions

434 00:24:10.905 --> 00:24:14.115 into like a single coherent, uh, final report.

435 00:24:15.775 --> 00:24:17.195 So, um,

436 00:24:18.335 --> 00:24:23.285 and, uh, uh, one thing I should mention is, so we, uh,

437 00:24:23.285 --> 00:24:28.165 we are using like, um, uh, in, uh, typically like a,

438 00:24:28.485 --> 00:24:32.605 a reasoning LLM for, uh, for these steps,

439 00:24:33.305 --> 00:24:35.485 uh, which I'll explain in a bit more detail, um,

440 00:24:36.065 --> 00:24:38.205 uh, a a few slides on.

441 00:24:38.665 --> 00:24:41.885 Uh, but that's sort of something that's really useful

442 00:24:41.885 --> 00:24:43.685 for like improving the performance of this,

443 00:24:44.065 --> 00:24:45.245 um, this reflection step.

444 00:24:47.375 --> 00:24:50.065 Okay. So let's go across to the, the GitHub

445 00:24:50.165 --> 00:24:51.985 and actually take a look at some code.

446 00:24:55.025 --> 00:24:57.165 And by the way, so, um, don't be shy.

447 00:24:57.165 --> 00:24:59.965 If you have any questions, uh, feel free

448 00:24:59.965 --> 00:25:01.925 to write them in the chat and I'll, I'll stop

449 00:25:01.925 --> 00:25:03.205 and answer them as they come up.

450 00:25:04.545 --> 00:25:08.085 So this is the, the GitHub repository for deep searcher.

451 00:25:09.465 --> 00:25:12.085 And, um, you can see there the, the architectural diagram.

452 00:25:14.205 --> 00:25:16.945 Um, and, um, uh, so this is like a,

453 00:25:16.945 --> 00:25:19.465 like a screenshot from from output.

454 00:25:19.685 --> 00:25:20.985 You can see it's sort of like printing.

455 00:25:21.805 --> 00:25:23.465 Um, so, uh, yeah,

456 00:25:23.465 --> 00:25:26.465 so like printing the inter intermediate steps, um,

457 00:25:26.465 --> 00:25:29.025 like the iterations of breaking it down into subqueries

458 00:25:29.025 --> 00:25:30.705 and answering those, um,

459 00:25:30.845 --> 00:25:32.465 you can see it says accelerated playback.

460 00:25:32.465 --> 00:25:35.265 So that's sort of one of the, uh, I guess like sort of the,

461 00:25:35.325 --> 00:25:39.585 the challenges of research agents currently is, um,

462 00:25:39.585 --> 00:25:43.145 they're very expensive in terms of, um, uh,

463 00:25:43.715 --> 00:25:45.105 foundation model inference.

464 00:25:46.285 --> 00:25:49.465 So, um, the example I'll be showing later on,

465 00:25:49.985 --> 00:25:53.905 I think made something like 75, uh, queries

466 00:25:53.905 --> 00:25:56.105 to a reasoning large language model.

467 00:25:56.885 --> 00:26:00.345 And, um, you know, that that's why it takes like 10 minutes,

468 00:26:00.375 --> 00:26:04.225 half an hour, or potentially longer to actually, uh, to run

469 00:26:04.225 --> 00:26:05.305 through all of the reasoning steps.

470 00:26:06.685 --> 00:26:08.545 Um, I actually hit the, the rate limit for,

471 00:26:08.545 --> 00:26:11.185 for when I was doing it with, um, um, an online service.

472 00:26:11.365 --> 00:26:15.985 So, um, I had to sort of do my 10 queries, wait a minute,

473 00:26:16.605 --> 00:26:17.665 do another 10 queries.

474 00:26:18.285 --> 00:26:22.945 Um, so, uh, uh, I think a key point is like, um, uh,

475 00:26:23.015 --> 00:26:25.145 inference is really like a key bottleneck.

476 00:26:26.855 --> 00:26:31.345 Okay, so we got a question from, um, Anna Ruda, which is

477 00:26:31.645 --> 00:26:36.345 how, how does the LM know when it has got sufficient, uh,

478 00:26:36.345 --> 00:26:38.625 knowledge such that there are no, uh, knowledge gaps?

479 00:26:39.615 --> 00:26:40.785 Yeah, so it's a great question.

480 00:26:41.445 --> 00:26:43.790 Um, I think this is, this is is just sort of like down

481 00:26:43.790 --> 00:26:46.045 to the, um, you know, like the, the,

482 00:26:46.045 --> 00:26:48.165 the magic emergent properties of LLMs.

483 00:26:48.545 --> 00:26:52.965 Uh, these models have been trained on, uh, specifically on,

484 00:26:52.985 --> 00:26:55.485 on reasoning like, uh, multi-step reasoning tasks.

485 00:26:56.385 --> 00:27:01.325 And, um, uh, so, um, yeah, it's,

486 00:27:01.605 --> 00:27:05.945 I mean, it, uh, it is just one of the sort of like the, um,

487 00:27:06.325 --> 00:27:08.265 the, the things that the OM can do,

488 00:27:08.855 --> 00:27:10.465 it's been trained on like so much data

489 00:27:10.645 --> 00:27:14.105 and sort of related tasks in, in, in the, the post training

490 00:27:14.615 --> 00:27:16.665 that it seems to be able to perform this task as well.

491 00:27:19.245 --> 00:27:21.585 Um, so does it need to be a reasoning model?

492 00:27:21.805 --> 00:27:24.505 Um, so, so no, it doesn't need to be a reasoning model.

493 00:27:25.105 --> 00:27:28.545 I think actually it would be advantageous to use some,

494 00:27:28.545 --> 00:27:31.185 some cheaper models for some of the other steps.

495 00:27:31.605 --> 00:27:34.065 So as you mentioned, so breaking down the subtask, uh,

496 00:27:34.225 --> 00:27:37.105 breaking down the query into subqueries, I think, um,

497 00:27:37.105 --> 00:27:41.505 that's something that you could have a much smaller, um, uh,

498 00:27:41.535 --> 00:27:44.465 well, I mean even just like a, a, a more sort

499 00:27:44.465 --> 00:27:46.305 of general chatbot lm, uh,

500 00:27:46.325 --> 00:27:51.185 but ideally a much smaller language model that has been, uh,

501 00:27:51.565 --> 00:27:53.745 has been like fine tuned for that purpose.

502 00:27:55.005 --> 00:27:58.225 So, um, that's sort of, I think like one of the sort of

503 00:27:58.895 --> 00:28:01.425 easy solutions we can get to, like speeding up the inference

504 00:28:01.425 --> 00:28:04.225 of these models is, uh, using sort

505 00:28:04.225 --> 00:28:07.025 of like smaller specialized models for each of the steps.

506 00:28:08.965 --> 00:28:13.545 So great question. Okay, so, um,

507 00:28:14.765 --> 00:28:16.625 so, uh, let's go through the code.

508 00:28:16.645 --> 00:28:20.425 So if you wanna try this at home, um, you can,

509 00:28:20.525 --> 00:28:23.945 you can get clone this repository, install the, uh,

510 00:28:23.945 --> 00:28:27.705 dependencies, and then, uh, copy and paste.

511 00:28:27.805 --> 00:28:32.545 Um, so here's some, uh, here's an example of, uh,

512 00:28:32.685 --> 00:28:35.385 how you actually sort of initiate a call to the,

513 00:28:35.545 --> 00:28:36.545 the deep research agent.

514 00:28:37.565 --> 00:28:41.145 So you can see here we create like a default configuration,

515 00:28:42.415 --> 00:28:44.905 then we put some, we, we override some of the options.

516 00:28:45.605 --> 00:28:49.105 So we tell the, um, the, the, the research agent

517 00:28:49.105 --> 00:28:51.385 that we want to use open AI

518 00:28:51.925 --> 00:28:54.065 as our large language inference service,

519 00:28:54.445 --> 00:28:58.705 and specifically we wanna use GPT-4 oh, um, uh, mini.

520 00:28:59.725 --> 00:29:02.065 Um, and, uh, for the embedding model,

521 00:29:02.315 --> 00:29:04.865 we're also gonna use open AI's embedding service.

522 00:29:05.925 --> 00:29:08.745 So, but, um, uh, deep searcher supports a number

523 00:29:08.745 --> 00:29:12.665 of different, uh, inference and embedding services.

524 00:29:13.485 --> 00:29:15.025 For example, you might want

525 00:29:15.025 --> 00:29:17.305 to use hugging faces sentence transformers

526 00:29:17.305 --> 00:29:18.945 locally for the embedding.

527 00:29:19.645 --> 00:29:22.025 Um, or in my case, you might want

528 00:29:22.025 --> 00:29:24.265 to use like a distilled version of, uh,

529 00:29:24.295 --> 00:29:26.905 deep seek RR one for the large language model.

530 00:29:28.455 --> 00:29:33.385 Okay. So then, um, we, um, ingest the, the,

531 00:29:33.385 --> 00:29:35.265 the data that we wanna search over.

532 00:29:36.085 --> 00:29:38.545 So, um, in this case, we are specifying

533 00:29:38.975 --> 00:29:40.905 that data in advance, uh,

534 00:29:40.945 --> 00:29:43.685 but as we like develop this, it'll be able to like

535 00:29:44.625 --> 00:29:47.405 go off like autonomously and, uh, find relevant sources,

536 00:29:48.385 --> 00:29:50.245 and then we just need to call this, uh,

537 00:29:50.245 --> 00:29:51.445 this query function here.

538 00:29:52.625 --> 00:29:56.005 Um, so, uh, one way, like I sort of like to sort of work out

539 00:29:56.005 --> 00:30:00.805 how code is working is literally just step through, um,

540 00:30:00.835 --> 00:30:02.125 just step through the functions.

541 00:30:02.265 --> 00:30:05.725 So, um, uh, so, you know, like I,

542 00:30:06.085 --> 00:30:08.245 I put this into a document, I ran it.

543 00:30:08.785 --> 00:30:11.525 Um, unfortunately I can't really do a live demo

544 00:30:11.525 --> 00:30:16.205 because, um, it, it would take like, say 10 plus minutes to,

545 00:30:16.385 --> 00:30:18.445 to give back a, um, report.

546 00:30:19.505 --> 00:30:22.205 Um, but imagine we've done that, so we know that it works.

547 00:30:22.205 --> 00:30:24.925 We can say, okay, let's actually, let's sort of do a, um,

548 00:30:24.925 --> 00:30:26.325 like a step through debugging

549 00:30:26.825 --> 00:30:28.925 and look inside each of these functions

550 00:30:28.925 --> 00:30:31.325 and work out how it's actually doing what it's doing.

551 00:30:32.065 --> 00:30:36.805 So we've got here query from online query, okay,

552 00:30:36.985 --> 00:30:39.845 so then, um, I use VS code,

553 00:30:39.845 --> 00:30:41.365 you can just sort of jump to definition.

554 00:30:42.545 --> 00:30:46.005 So then we see that it's calling, um,

555 00:30:46.195 --> 00:30:49.125 this configuration dot default searcher,

556 00:30:50.465 --> 00:30:53.165 and it's calling default searcher query on that.

557 00:30:54.915 --> 00:30:57.425 So what is default search?

558 00:30:57.615 --> 00:30:59.585 Well, we'll have to go to the configuration

559 00:30:59.605 --> 00:31:02.265 and see what the, um, is setting it to there.

560 00:31:03.805 --> 00:31:05.105 Oh, and so just before we go on,

561 00:31:05.105 --> 00:31:08.655 so we've got another question from, uh,

562 00:31:15.155 --> 00:31:16.515 actually no, sorry, I think I've already answered that.

563 00:31:16.515 --> 00:31:19.635 That's from, uh, from Anna about how does it know

564 00:31:19.635 --> 00:31:20.795 that there are, are knowledge gaps?

565 00:31:22.575 --> 00:31:27.555 Um, and yeah, so, so that's just like, like I said, um, the,

566 00:31:27.555 --> 00:31:31.355 uh, the, these foundation models, um, especially

567 00:31:31.355 --> 00:31:33.035 after they've been sort of like post trained for,

568 00:31:33.055 --> 00:31:36.795 for different tasks like chat, uh, reasoning evaluation,

569 00:31:37.385 --> 00:31:41.315 have like massive sort of transfer learning to unseen tasks.

570 00:31:42.055 --> 00:31:46.435 And so, um, I'm not sure exactly whether this task of

571 00:31:47.035 --> 00:31:48.075 identifying, uh,

572 00:31:48.145 --> 00:31:50.315 knowledge gaps was in the training somewhere.

573 00:31:50.775 --> 00:31:53.275 Uh, but it's, you know, just like the power of like scale

574 00:31:53.335 --> 00:31:56.555 and transfer learning that it can do tasks like this.

575 00:31:58.465 --> 00:32:00.915 Okay, so we see that, uh,

576 00:32:00.985 --> 00:32:05.675 when the configuration is initialized, the default searcher,

577 00:32:05.905 --> 00:32:10.795 it's, it's, um, creating this, this rag router option,

578 00:32:12.395 --> 00:32:16.975 and then it creates, um, so we've got two agents in here.

579 00:32:18.255 --> 00:32:22.215 I think it's gonna sort of like, uh, you know, um, uh,

580 00:32:23.235 --> 00:32:25.975 uh, work out, which onto route to, but we'll, we'll look in.

581 00:32:25.995 --> 00:32:28.135 So, so chain of rag is like a different technique.

582 00:32:28.755 --> 00:32:32.145 Um, if you're interested, you can, you can check out this,

583 00:32:32.255 --> 00:32:35.065 this research paper that describes it in a bit more detail.

584 00:32:35.845 --> 00:32:40.005 Um, but we we're gonna look inside the, the deep search,

585 00:32:40.945 --> 00:32:43.285 um, uh, object here.

586 00:32:44.145 --> 00:32:47.645 And so it seems like this object is gonna contain the logic

587 00:32:48.305 --> 00:32:52.965 to perform our, excuse me, to perform this, um,

588 00:32:52.965 --> 00:32:55.325 like this architecture of, uh, research agent.

589 00:32:57.545 --> 00:33:00.005 So let's have a look into deep search now.

590 00:33:02.105 --> 00:33:04.805 So I've gone across to the deep search, the, the file

591 00:33:04.805 --> 00:33:06.885 that contains, uh, deep search.

592 00:33:07.265 --> 00:33:09.045 And by the way, is, is this actually,

593 00:33:09.065 --> 00:33:10.245 is this large enough for everyone?

594 00:33:10.275 --> 00:33:12.565 I'll just, um, see if I can make the font a bit bigger.

595 00:33:14.895 --> 00:33:19.625 Okay. Um, so, uh, here is the definition

596 00:33:19.725 --> 00:33:22.065 of this deep search, um, object.

597 00:33:23.415 --> 00:33:26.915 Um, you can see it sort of takes a number of parameters to,

598 00:33:27.255 --> 00:33:30.995 um, uh, to, to store in the object.

599 00:33:30.995 --> 00:33:34.795 So it takes like a base LLM, it takes an embedding model,

600 00:33:35.775 --> 00:33:39.195 it takes a vector db, um,

601 00:33:40.195 --> 00:33:41.475 a max number of iterations.

602 00:33:41.735 --> 00:33:44.835 So that's gonna be like a, like a limit on the number

603 00:33:44.835 --> 00:33:47.315 of these reflection cycles that we can do,

604 00:33:51.305 --> 00:33:55.615 um, as well as some other sort

605 00:33:55.815 --> 00:33:57.935 of like settings that, um, a bit more sort

606 00:33:57.935 --> 00:33:58.975 of, um, miscellaneous.

607 00:33:58.995 --> 00:34:02.415 So I'll, I'll jump over them. Um, so, um,

608 00:34:03.715 --> 00:34:05.095 now we know like what the object is

609 00:34:05.095 --> 00:34:06.615 that is actually performing the query.

610 00:34:07.235 --> 00:34:08.255 So I'll just go back here.

611 00:34:08.275 --> 00:34:13.105 You can see that this method here is gonna call, uh,

612 00:34:13.175 --> 00:34:14.345 deep search dot query.

613 00:34:14.525 --> 00:34:15.865 So what does that do?

614 00:34:15.895 --> 00:34:17.705 Well, let's go and have a look at, at the code.

615 00:34:22.655 --> 00:34:24.505 Okay, so here we are here.

616 00:34:24.885 --> 00:34:26.705 And, um, so, so just to, to reiterate.

617 00:34:26.725 --> 00:34:29.705 So, um, I would actually do this with a step

618 00:34:29.705 --> 00:34:34.225 through debugger in, uh, vs code, just so like doing,

619 00:34:34.345 --> 00:34:37.385 you know, step through, stepping into, stepping into, um,

620 00:34:37.805 --> 00:34:40.185 to sort of like follow the, the path of execution.

621 00:34:40.965 --> 00:34:43.105 And, um, that just, it's like a really good way

622 00:34:43.105 --> 00:34:47.345 to understand, which is like the relevant code to sort of,

623 00:34:47.345 --> 00:34:50.665 um, uh, like what, uh, what code is doing, what,

624 00:34:51.275 --> 00:34:53.025 where is like the most relevant code

625 00:34:53.025 --> 00:34:54.385 to understand what's going on.

626 00:34:55.285 --> 00:34:59.545 Um, and also, um, uh, so, um, uh,

627 00:35:00.165 --> 00:35:01.865 not everyone's familiar with this, but I think a really

628 00:35:01.865 --> 00:35:03.465 helpful tool for, for debugging

629 00:35:03.465 --> 00:35:08.265 and understanding code is the, uh, the debug, um, uh,

630 00:35:08.265 --> 00:35:12.185 console in, in vs code or whatever IDE is.

631 00:35:12.765 --> 00:35:15.145 So when you've stopped execution in your step three

632 00:35:15.145 --> 00:35:17.945 debugging, you can actually then just type in expressions

633 00:35:18.615 --> 00:35:19.945 into the, the debug terminal.

634 00:35:20.845 --> 00:35:23.025 So you can type in, you know, like,

635 00:35:23.025 --> 00:35:24.545 what is the shape of this tensor?

636 00:35:25.335 --> 00:35:30.265 What is, um, the value of this flag, um, is,

637 00:35:30.505 --> 00:35:31.745 does some condition hold?

638 00:35:31.885 --> 00:35:33.065 And that's a really useful way

639 00:35:33.065 --> 00:35:35.985 for like interrogating the program, um, as it's running

640 00:35:36.325 --> 00:35:37.685 to understand what's going on.

641 00:35:39.025 --> 00:35:41.405 But, um, let's look at, let's look at this query function.

642 00:35:42.265 --> 00:35:44.925 So we can see that the first thing it does is it calls

643 00:35:46.595 --> 00:35:47.805 self retrieve.

644 00:35:48.675 --> 00:35:51.965 Okay? So I think it we're sort of like untangling the, the,

645 00:35:51.985 --> 00:35:56.765 um, uh, you know, like the, the execution flow of of, of

646 00:35:56.765 --> 00:35:58.285 where the actual agent's happening.

647 00:35:59.145 --> 00:36:00.525 Uh, I think we're getting a bit closer.

648 00:36:00.745 --> 00:36:02.845 So now let's look at the self retrieve function.

649 00:36:06.205 --> 00:36:07.065 So where is it?

650 00:36:11.015 --> 00:36:15.875 Oh yeah, here we go. So self retrieve that call, that calls,

651 00:36:16.255 --> 00:36:19.275 um, the self dot asynchronous retrieve.

652 00:36:20.095 --> 00:36:22.435 Um, so these, we are just sort of like hacking

653 00:36:22.435 --> 00:36:24.675 through the layers of indirection to get

654 00:36:24.675 --> 00:36:25.755 to the, the core of it.

655 00:36:27.335 --> 00:36:30.715 And so, um, it's gonna call this, it's gonna run this, um,

656 00:36:31.575 --> 00:36:33.555 uh, this function asynchronously.

657 00:36:34.335 --> 00:36:36.635 Um, and now it looks like we've actually gotten

658 00:36:36.695 --> 00:36:39.315 to like the core logic of how the, um,

659 00:36:39.655 --> 00:36:41.915 the research, um, agent works.

660 00:36:43.215 --> 00:36:47.945 Okay? So we start off by, um, just like

661 00:36:48.895 --> 00:36:50.185 setting a variable

662 00:36:50.255 --> 00:36:52.425 that contains the maximum number of iterations.

663 00:36:54.565 --> 00:36:59.425 And, um, so this comment here, um, indicates that,

664 00:37:00.365 --> 00:37:03.185 um, the first thing we'll do is we'll

665 00:37:03.715 --> 00:37:05.865 break down the query into subqueries

666 00:37:06.565 --> 00:37:08.265 by prompting the large language model.

667 00:37:10.125 --> 00:37:13.425 So that's this sort of like, jump from the useless query

668 00:37:13.485 --> 00:37:15.985 to the first, uh, set of subqueries.

669 00:37:19.115 --> 00:37:22.015 And we've got a, a generate subqueries, uh, method here.

670 00:37:22.235 --> 00:37:26.375 Um, now that we've, like, we've sort of, we've reached, um,

671 00:37:26.525 --> 00:37:28.975 sort of like the core logic, um, I won't sort

672 00:37:28.975 --> 00:37:30.615 of jump down further until we've gone

673 00:37:30.615 --> 00:37:32.215 through this entire loop and we can see like

674 00:37:32.375 --> 00:37:36.855 what exactly are the prompts that, um, uh, uh,

675 00:37:37.045 --> 00:37:39.375 what are the prompts that the LLM is being prompted with

676 00:37:39.835 --> 00:37:43.295 to do the different tasks like generating the subqueries,

677 00:37:43.475 --> 00:37:45.815 um, working out where there's knowledge gaps and so on.

678 00:37:47.365 --> 00:37:50.975 Okay, so, um, here we've got the log color print.

679 00:37:50.975 --> 00:37:52.455 So this is what you'll see in the terminal

680 00:37:52.525 --> 00:37:53.815 when it's performing this step.

681 00:37:55.155 --> 00:37:58.655 Um, and then it takes the, the list of current subqueries

682 00:37:58.655 --> 00:38:00.975 and then just adds those ones to it.

683 00:38:03.695 --> 00:38:05.075 And so now we have a loop.

684 00:38:05.215 --> 00:38:08.635 So now we've sort of entered this main, uh, logic loop,

685 00:38:09.545 --> 00:38:11.915 this main, um, uh,

686 00:38:12.175 --> 00:38:13.515 I'm not sure if you can see my mouse pointed,

687 00:38:13.535 --> 00:38:17.355 but I'm sort of like circling around this like inner loop in

688 00:38:17.355 --> 00:38:19.835 the online serving, um, area of,

689 00:38:20.135 --> 00:38:21.955 of the, the architect diagram.

690 00:38:24.175 --> 00:38:28.955 Um, so the first, the first step is to, um,

691 00:38:29.215 --> 00:38:32.395 so to, to, um, uh, search

692 00:38:32.415 --> 00:38:34.875 for relevant chunks from the vector database,

693 00:38:35.535 --> 00:38:40.405 given the query and the, um, the, the, the subqueries.

694 00:38:41.585 --> 00:38:45.085 So this is actually gonna return, um, some like,

695 00:38:45.085 --> 00:38:46.125 asynchronous tasks.

696 00:38:47.065 --> 00:38:52.045 So, um, uh, so that, that's this sort of like step of,

697 00:38:52.585 --> 00:38:55.325 of calling the, um, the, the vector database here

698 00:38:55.475 --> 00:38:57.405 with those, those queries and subqueries.

699 00:38:59.365 --> 00:39:02.305 Um, so tho those just return tasks, um,

700 00:39:02.305 --> 00:39:06.145 they don't get executed until we call this awai,

701 00:39:06.685 --> 00:39:08.065 um, async io gather.

702 00:39:08.765 --> 00:39:12.265 And that actually executes the, the tasks in parallel

703 00:39:12.405 --> 00:39:15.905 and then waits for the, the final one to, to finish

704 00:39:16.485 --> 00:39:17.905 before setting search results.

705 00:39:19.575 --> 00:39:22.185 Okay? So then for, um, uh,

706 00:39:22.445 --> 00:39:24.585 we take these results from the subqueries

707 00:39:25.125 --> 00:39:26.145 and then we merge them,

708 00:39:26.685 --> 00:39:30.945 and we're also keeping track of how many consume tokens, uh,

709 00:39:31.205 --> 00:39:34.145 we have because, um, you know, we wanna sort of be able to,

710 00:39:34.845 --> 00:39:37.905 to calculate the, the cost of this afterwards.

711 00:39:38.685 --> 00:39:40.905 Um, also we might wanna set like a,

712 00:39:41.495 --> 00:39:45.985 like a hard limit on like a, like a token budget, uh,

713 00:39:46.195 --> 00:39:48.185 token token budget I should say.

714 00:39:49.965 --> 00:39:52.625 Um, and, um,

715 00:39:55.285 --> 00:39:58.135 okay, so then we sort of, um, yeah, then we take, um,

716 00:39:58.155 --> 00:40:01.695 the search results, uh, put them into this, this list of,

717 00:40:01.835 --> 00:40:03.575 uh, search results from Vector db.

718 00:40:05.595 --> 00:40:10.455 And, um, we, so, um, uh, I think in many cases

719 00:40:11.875 --> 00:40:13.615 we are going to have, uh,

720 00:40:13.645 --> 00:40:16.455 duplicate chunks returned from the vector database.

721 00:40:17.635 --> 00:40:21.575 So, um, a good step is just to like, to deduplicate those so

722 00:40:21.575 --> 00:40:23.535 that we have a list of like unique chunks

723 00:40:24.085 --> 00:40:26.455 fetched from the vector database, uh,

724 00:40:26.805 --> 00:40:28.415 from those subquery queries.

725 00:40:31.435 --> 00:40:33.775 Um, so this is where we break if we've,

726 00:40:33.955 --> 00:40:36.215 if we've reached the maximum number of iterations.

727 00:40:37.755 --> 00:40:42.015 Uh, but then the next step is the performing the reflection

728 00:40:42.555 --> 00:40:45.615 and getting, uh, additional, um, queries that can

729 00:40:46.145 --> 00:40:47.815 cover any, like knowledge gaps.

730 00:40:49.055 --> 00:40:52.635 So if we go back here now, you can see where, so we,

731 00:40:52.635 --> 00:40:56.195 we've sort of gone through this loop, now we're in this, um,

732 00:40:56.665 --> 00:40:59.035 this, uh, yellow orange diamond,

733 00:40:59.995 --> 00:41:02.895 and we're performing this, uh, this reflection step.

734 00:41:03.835 --> 00:41:05.495 So this is where the LLM

735 00:41:05.835 --> 00:41:07.775 or the reasoning model is going

736 00:41:07.775 --> 00:41:10.295 to actually control the execution.

737 00:41:14.655 --> 00:41:18.275 And, um, so, uh, then, um, we just perform.

738 00:41:18.375 --> 00:41:21.115 And so we, uh, we prompt the l the reasoning model

739 00:41:21.145 --> 00:41:25.155 with another prompt to generate the, the gap queries.

740 00:41:26.855 --> 00:41:29.155 And, um, so this is gonna generate, so if, if it,

741 00:41:29.175 --> 00:41:33.355 if the model believes that there are additional queries

742 00:41:33.355 --> 00:41:35.315 that need to be answered to sort

743 00:41:35.315 --> 00:41:38.275 of fill in any knowledge gaps, that it will return them in,

744 00:41:38.375 --> 00:41:40.355 in the sub, uh, sub gapp queries.

745 00:41:41.415 --> 00:41:45.595 Um, so we know that if, if that's empty, then we know

746 00:41:45.595 --> 00:41:46.835 that we can terminate that loop

747 00:41:48.695 --> 00:41:51.325 and then go onto the generating the final report.

748 00:41:53.705 --> 00:41:58.165 Uh, but otherwise we then just add those new sub subqueries

749 00:41:58.665 --> 00:41:59.685 to the subqueries.

750 00:41:59.685 --> 00:42:01.885 So this is sort of like a stack of, of,

751 00:42:02.065 --> 00:42:03.285 uh, queries to answer.

752 00:42:03.945 --> 00:42:05.565 We then add them to that list

753 00:42:06.185 --> 00:42:09.485 and then repeat this, um, iteration.

754 00:42:11.185 --> 00:42:14.005 So we then just do like another loop around here.

755 00:42:15.945 --> 00:42:18.285 Um, so, and you know, that's essentially it.

756 00:42:18.345 --> 00:42:21.165 So, um, uh, if we've got time, I'll just sort

757 00:42:21.165 --> 00:42:23.765 of briefly sort of look into these, these functions

758 00:42:23.765 --> 00:42:25.005 that actually define the prompts

759 00:42:25.585 --> 00:42:28.725 and, um, uh, hold your horses as such.

760 00:42:28.845 --> 00:42:30.525 I, I, I've just got a few more minutes

761 00:42:30.585 --> 00:42:32.485 and then, um, I'll, I'll give some conclusions

762 00:42:33.305 --> 00:42:35.005 and then we'll, we'll leave, um, five,

763 00:42:35.025 --> 00:42:36.485 10 minutes at the end for any questions.

764 00:42:37.185 --> 00:42:40.245 Um, so actually I'll, uh, what I'll say is I'll, I'll, uh,

765 00:42:40.245 --> 00:42:43.045 leave this for, for your sort of, uh, you know, personal,

766 00:42:43.545 --> 00:42:46.285 um, enjoyment, uh, uh, education.

767 00:42:46.305 --> 00:42:47.885 So you can actually look inside these,

768 00:42:47.885 --> 00:42:50.605 these methods really easily, um,

769 00:42:50.945 --> 00:42:54.605 and find out like, what, how have we actually, um, sort

770 00:42:54.605 --> 00:42:58.565 of like formatted the prompt to perform these tasks and,

771 00:42:58.585 --> 00:42:59.685 and to do that successfully.

772 00:43:00.465 --> 00:43:01.965 So you can, I think it's always like, good

773 00:43:01.965 --> 00:43:05.365 to actually like read the prompt to understand like, what is

774 00:43:06.185 --> 00:43:07.525 the, the model, like

775 00:43:07.525 --> 00:43:09.285 what exactly is the model being instructed to do?

776 00:43:10.105 --> 00:43:11.285 So encourage you to like, look

777 00:43:11.285 --> 00:43:14.045 inside these generate gap queries, um,

778 00:43:14.105 --> 00:43:16.645 search chunks from vector, uh,

779 00:43:16.925 --> 00:43:18.165 generate subqueries, you know, et cetera.

780 00:43:18.905 --> 00:43:22.685 Um, so, um, uh, very quickly, so

781 00:43:22.685 --> 00:43:25.725 after it's done that it terminates, it then returns

782 00:43:25.745 --> 00:43:26.805 to this query function,

783 00:43:27.545 --> 00:43:31.205 and then now we're in these, uh, two steps here of

784 00:43:31.725 --> 00:43:36.125 synthesizing the report from all of these, um, subqueries

785 00:43:36.345 --> 00:43:37.965 and retrieve trunks.

786 00:43:39.385 --> 00:43:41.005 Um, and, you know, that's just like more

787 00:43:41.005 --> 00:43:42.205 prompting of the same model.

788 00:43:43.375 --> 00:43:47.555 And, uh, when you've done that, then, um, uh, so, you know,

789 00:43:47.555 --> 00:43:48.795 it may, may take like 10 minutes,

790 00:43:49.805 --> 00:43:53.025 30 minutes depending on like what inference service you use.

791 00:43:53.025 --> 00:43:56.225 And the question, it will have done like multiple iterations

792 00:43:56.405 --> 00:43:58.305 of this, um, like, you know, this reasoning

793 00:43:59.065 --> 00:44:01.865 breaking down the, the, the question into a number of like,

794 00:44:01.995 --> 00:44:05.625 steps to answer it, uh, working out, like whether it's,

795 00:44:05.765 --> 00:44:07.425 it should keep going or, or finish.

796 00:44:07.965 --> 00:44:09.865 And it generates a nice little report.

797 00:44:10.525 --> 00:44:12.145 And, um, I've got an example here.

798 00:44:12.765 --> 00:44:17.235 Um, so the, the question was how has the, the Simpsons,

799 00:44:17.455 --> 00:44:18.595 uh, evolved over time?

800 00:44:19.735 --> 00:44:22.155 And it's put together this nice little report sort

801 00:44:22.155 --> 00:44:27.075 of really like covering all bases, um, a nice sort

802 00:44:27.075 --> 00:44:28.475 of like conclusion tying things together.

803 00:44:29.575 --> 00:44:33.285 So, um, let's go back to the, um, the slides

804 00:44:33.585 --> 00:44:34.885 and we'll wrap things up.

805 00:44:38.505 --> 00:44:42.115 So could I give a rough overview of the prompts?

806 00:44:42.855 --> 00:44:45.915 Um, I think just for time, um, uh,

807 00:44:51.605 --> 00:44:52.455 yeah, let's have a look.

808 00:45:01.515 --> 00:45:04.045 Yeah, so, so, so for time, uh, I'm just gonna have, sorry,

809 00:45:04.385 --> 00:45:06.405 um, I think I'm gonna have to skip like looking at the

810 00:45:06.405 --> 00:45:08.205 exact, um, code.

811 00:45:08.665 --> 00:45:13.395 Uh, but let, let me just point you to, um, geez,

812 00:45:13.415 --> 00:45:15.635 now we bit lost my place.

813 00:45:22.065 --> 00:45:24.235 Okay. Yeah, so, so this is also in deep search,

814 00:45:24.375 --> 00:45:28.955 and we've got Subquery prompt And you can see, so yeah, so,

815 00:45:28.955 --> 00:45:31.555 so these prompts are actually in the deep search do pi file.

816 00:45:32.215 --> 00:45:35.035 So for example, you're an AI content analysis expert,

817 00:45:35.035 --> 00:45:37.475 good summarizing content, please summarize,

818 00:45:37.655 --> 00:45:38.715 you know, dah, dah, dah, dah.

819 00:45:39.625 --> 00:45:40.795 Then there's a refre, uh,

820 00:45:40.835 --> 00:45:43.875 a reflection prompt determine whether additional search

821 00:45:44.035 --> 00:45:46.355 queries are needed based on the original query, et cetera.

822 00:45:47.135 --> 00:45:48.635 Um, there's a re-ranking prompt

823 00:45:49.815 --> 00:45:51.235 and there's a subquery prompt.

824 00:45:51.535 --> 00:45:52.995 So you can see we've got, um,

825 00:45:53.095 --> 00:45:56.195 at least like four different types of, uh, uh, prompting

826 00:45:56.195 --> 00:45:58.115 for different, uh, subtasks.

827 00:45:59.015 --> 00:46:00.595 So, um, but, but check out this file

828 00:46:00.615 --> 00:46:02.675 and you can, you can sort of like, uh,

829 00:46:03.285 --> 00:46:04.595 check them out in some more detail.

830 00:46:05.135 --> 00:46:07.995 Uh, but going back to the slides, let's see.

831 00:46:15.785 --> 00:46:20.555 Okay, so, um, Uh, so what's sort

832 00:46:20.555 --> 00:46:23.835 of like some of the secret source, um, behind how,

833 00:46:23.835 --> 00:46:25.035 how these ations work?

834 00:46:25.975 --> 00:46:28.075 Uh, well, I think one thing is this idea

835 00:46:28.095 --> 00:46:29.555 of conditional computation.

836 00:46:30.095 --> 00:46:33.315 And so that means that the model can actually decide

837 00:46:33.695 --> 00:46:37.595 how much computation to do based on the current, sort

838 00:46:37.595 --> 00:46:40.195 of like, status of, um, the model output.

839 00:46:40.195 --> 00:46:42.275 And so this can be done in, in a number of different ways.

840 00:46:43.135 --> 00:46:46.435 Um, so one sort of, uh, you know,

841 00:46:46.435 --> 00:46:49.595 more complex way is you could actually introduce, uh, uh,

842 00:46:49.595 --> 00:46:53.555 like reasoning tokens that sort of tell the model to sort

843 00:46:53.555 --> 00:46:56.515 of like keep, you know, generating intermediate output,

844 00:46:56.845 --> 00:46:58.155 doing additional computations

845 00:46:58.155 --> 00:46:59.675 until some termination condition.

846 00:47:00.095 --> 00:47:02.515 Uh, apparently deep seek doesn't use this method.

847 00:47:03.175 --> 00:47:05.915 Um, but you know, this is like a good strategy to,

848 00:47:05.975 --> 00:47:07.275 to do conditional computation.

849 00:47:08.695 --> 00:47:10.635 Um, I think the second one is this

850 00:47:11.115 --> 00:47:12.195 reinforcement learning training.

851 00:47:12.415 --> 00:47:15.195 So, um, really simple just taking, uh,

852 00:47:15.195 --> 00:47:18.595 conceptually taking a strong base model, um,

853 00:47:18.815 --> 00:47:19.995 so like deep seek did

854 00:47:20.615 --> 00:47:23.195 and then applying, um, a form

855 00:47:23.195 --> 00:47:24.835 of reinforcement learning on

856 00:47:24.835 --> 00:47:26.035 very high quality reasoning data.

857 00:47:26.455 --> 00:47:28.395 And then there's sort of like this aha moment where

858 00:47:29.025 --> 00:47:30.965 the model just like starts to learn how to reason.

859 00:47:34.025 --> 00:47:38.645 Um, so, um, uh, it's not like all sort of like, um, uh,

860 00:47:38.645 --> 00:47:39.685 rainbows and sunshine.

861 00:47:40.195 --> 00:47:41.325 They're of course, like some,

862 00:47:41.485 --> 00:47:42.925 I think some quite major challenges with these.

863 00:47:42.925 --> 00:47:45.125 And I think the first one is just the cost.

864 00:47:45.785 --> 00:47:49.605 So if you're using Open AI's deep research agent, um,

865 00:47:49.625 --> 00:47:51.205 you'll have to have their pro subscription,

866 00:47:51.245 --> 00:47:52.605 I think it's like $200 a month,

867 00:47:53.325 --> 00:47:56.725 I believe they're still not covering the, the, the cost

868 00:47:56.725 --> 00:47:58.845 of like inference, uh, by charging that.

869 00:47:59.625 --> 00:48:01.325 Um, and so one thing I discovered, sort

870 00:48:01.325 --> 00:48:02.925 of like actually running these queries is

871 00:48:03.785 --> 00:48:05.725 how much inference they actually require,

872 00:48:05.725 --> 00:48:09.005 because firstly, these reasoning models, um,

873 00:48:09.075 --> 00:48:12.605 just typically use a lot more, uh, inference, um, you know,

874 00:48:12.605 --> 00:48:14.125 going through their like, number of reasoning steps

875 00:48:14.585 --> 00:48:16.045 for a given prompt, uh,

876 00:48:16.065 --> 00:48:19.045 but also the fact that it'll need to do multiple calls

877 00:48:19.065 --> 00:48:20.325 of these, um,

878 00:48:20.875 --> 00:48:23.565 many more calls than like a simple, uh, rack system.

879 00:48:25.345 --> 00:48:29.445 Um, again, just like hallucinations, like whole, um, uh,

880 00:48:29.445 --> 00:48:32.085 foundation models like a general problem, um,

881 00:48:32.115 --> 00:48:33.525 they can be reasoning errors.

882 00:48:33.745 --> 00:48:37.805 So in its intermediate, uh, reasoning chain trace it,

883 00:48:37.985 --> 00:48:40.325 if there's some sort of like incorrect, um, step,

884 00:48:40.955 --> 00:48:43.045 then all the following steps could fail.

885 00:48:43.945 --> 00:48:47.875 Um, and then I think finally, uh, to actually sort

886 00:48:47.875 --> 00:48:50.275 of train the, these reasoning models, we need

887 00:48:50.275 --> 00:48:52.395 to have really high quality, uh,

888 00:48:52.865 --> 00:48:54.915 open source reasoning, trace data sets.

889 00:48:55.455 --> 00:48:57.715 And so that's something that people are working on so

890 00:48:57.715 --> 00:49:01.475 that we can, uh, reproduce some of these results from,

891 00:49:01.475 --> 00:49:04.115 from open AI and, um, and its competitors.

892 00:49:05.745 --> 00:49:08.595 Okay. So I can tell, uh, ACHI is, um,

893 00:49:08.595 --> 00:49:10.155 hurrying me along so, so very, very quickly.

894 00:49:10.255 --> 00:49:13.155 So, uh, in terms of the cost, some sort

895 00:49:13.155 --> 00:49:14.835 of solutions people working on for

896 00:49:14.835 --> 00:49:17.155 that is specialized hardware, uh,

897 00:49:17.155 --> 00:49:18.955 but also other types of reasoning.

898 00:49:18.975 --> 00:49:23.115 So there's an idea called continuous chain of thought that,

899 00:49:23.615 --> 00:49:25.875 um, doesn't sort of like use discrete tokens to reason,

900 00:49:25.935 --> 00:49:28.555 but actually uses like a, a continuous latent variable,

901 00:49:28.685 --> 00:49:32.035 which is a lot, um, uh, um, more cost effective.

902 00:49:32.735 --> 00:49:35.555 Um, and then like these barriers to entry both

903 00:49:35.555 --> 00:49:37.155 with like the open source software

904 00:49:37.855 --> 00:49:39.915 and the, you know, like the data and the models.

905 00:49:40.695 --> 00:49:42.875 So, um, players like Ziot

906 00:49:42.895 --> 00:49:46.715 and hugging Face, uh, we are working on, uh,

907 00:49:46.955 --> 00:49:48.955 reproducing these results fully open source,

908 00:49:49.455 --> 00:49:50.915 so then new folks can just, you know,

909 00:49:50.915 --> 00:49:53.395 take away the learnings with, uh, systems that work

910 00:49:53.415 --> 00:49:55.875 and, um, build really easily build

911 00:49:55.875 --> 00:49:57.195 successful research agents.

912 00:49:58.535 --> 00:50:00.315 So, um, I think that's it from me

913 00:50:00.535 --> 00:50:03.635 and, um, uh, it looks like I went a bit over time,

914 00:50:03.635 --> 00:50:05.795 but I think we've got five minutes left for questions.

915 00:50:06.585 --> 00:50:08.435 Yeah. So you have a few questions,

916 00:50:08.495 --> 00:50:10.115 so let's just, uh, go through them.

917 00:50:10.935 --> 00:50:13.635 How, uh, how does the vector embedding work

918 00:50:13.635 --> 00:50:17.155 for images are vector vectors created for pixels

919 00:50:17.155 --> 00:50:18.195 or image portions?

920 00:50:21.835 --> 00:50:25.925 Yeah. Okay. So, um, Um,

921 00:50:26.805 --> 00:50:29.025 I, yeah.

922 00:50:29.025 --> 00:50:33.265 Okay. So, um, so how, so, uh, this sort of like framework

923 00:50:33.265 --> 00:50:35.545 that I presented is like very sort of, uh, general,

924 00:50:36.165 --> 00:50:39.185 all you need is some concept of embedding

925 00:50:39.565 --> 00:50:41.585 and some sort of like foundation model.

926 00:50:42.205 --> 00:50:46.585 Um, so typically, um, uh, so, um, uh,

927 00:50:47.555 --> 00:50:49.345 there are like really good open source models

928 00:50:49.575 --> 00:50:51.705 that can perform embedding of images,

929 00:50:52.485 --> 00:50:54.625 and they typically work on the whole image.

930 00:50:55.205 --> 00:50:58.665 Um, they might be looking at sort of like, uh, patches put

931 00:50:58.665 --> 00:51:00.225 that into like a vision transformer

932 00:51:00.685 --> 00:51:02.865 and then output like an embedding for the entire image.

933 00:51:03.805 --> 00:51:06.545 And we can, we can, there's models that will allow you

934 00:51:06.545 --> 00:51:10.305 to sort of like embed images into the same space as as text.

935 00:51:11.005 --> 00:51:14.385 So, um, uh, all of the same sort of concepts apply here.

936 00:51:14.805 --> 00:51:17.025 You just use a different embedding model that's specific

937 00:51:17.045 --> 00:51:18.545 for images or images and text.

938 00:51:21.405 --> 00:51:24.145 Um, so in the iterate search reasoning cycle

939 00:51:25.085 --> 00:51:27.305 is the iterated reasoning being performed

940 00:51:27.305 --> 00:51:29.705 by the reasoning LLM and search by the vector db.

941 00:51:30.325 --> 00:51:33.505 Uh, so the, um, so the LLM, it can never actually, it,

942 00:51:33.525 --> 00:51:36.105 it never actually performs an action, it just sort

943 00:51:36.105 --> 00:51:38.465 of gives the instruction to perform an action.

944 00:51:39.045 --> 00:51:43.105 So it will say, okay, search the vector database for,

945 00:51:43.165 --> 00:51:45.545 you know, this, and then the code will actually

946 00:51:45.545 --> 00:51:46.785 perform that, that search.

947 00:51:47.165 --> 00:51:50.185 But yeah, it's, it's the vector database that is performing

948 00:51:50.185 --> 00:51:51.305 that similarity search.

949 00:51:51.845 --> 00:51:54.825 The, the LM just sort of like requests an instance

950 00:51:54.845 --> 00:51:55.945 of, uh, tool usage.

951 00:51:57.205 --> 00:51:58.745 Um, doesn't need to be a reasoning model.

952 00:51:58.765 --> 00:51:59.785 So yeah, I think we've covered this.

953 00:51:59.885 --> 00:52:04.585 Um, so no, um, I think it's actually probably good if many

954 00:52:04.585 --> 00:52:06.425 of the other steps are, are not reasoning models

955 00:52:06.425 --> 00:52:10.185 because you can reduce the cost of, um, of, of inference

956 00:52:10.185 --> 00:52:14.945 to rank the system, um, is semantic search.

957 00:52:15.245 --> 00:52:19.685 Um, so, um,

958 00:52:20.595 --> 00:52:24.365 does, so, yeah, so, so Melva has support for,

959 00:52:24.625 --> 00:52:26.645 for hybrid, um, search.

960 00:52:27.185 --> 00:52:31.805 So lexical plus semantic search, uh, with Melva 2.5,

961 00:52:32.305 --> 00:52:33.325 you could implement that

962 00:52:33.865 --> 00:52:35.405 and that would, you know, just sort

963 00:52:35.405 --> 00:52:39.685 of be like a very small modification to that, um, uh,

964 00:52:39.685 --> 00:52:42.725 vector database lookup step in in the research agent.

965 00:52:44.345 --> 00:52:47.245 Can we choose the subquery number? Peram?

966 00:52:47.505 --> 00:52:49.965 Um, so I think this is one of those things

967 00:52:49.965 --> 00:52:51.485 where I think it's actually best.

968 00:52:51.825 --> 00:52:54.845 So we are sort of like designing the system to be like,

969 00:52:54.865 --> 00:52:57.045 as autonomous as possible.

970 00:52:57.985 --> 00:53:00.605 Um, you could, you know, you could sort of like hard code,

971 00:53:00.675 --> 00:53:02.645 like a maximum number.

972 00:53:02.945 --> 00:53:04.565 You could, I don't know, you could like re-rank

973 00:53:04.565 --> 00:53:06.045 them, take a maximum.

974 00:53:06.825 --> 00:53:09.845 Um, I think the simplest I implementation though just lets

975 00:53:09.845 --> 00:53:12.165 the foundation model, um, decide how many,

976 00:53:12.345 --> 00:53:13.765 um, uh, there should be.

977 00:53:14.185 --> 00:53:17.285 But um, yeah, so that's just like a design choice.

978 00:53:17.885 --> 00:53:19.205 I would just recommend letting the

979 00:53:19.205 --> 00:53:23.125 model actually decide that. Um, okay.

980 00:53:23.585 --> 00:53:25.405 You do have a few questions in the chat.

981 00:53:25.905 --> 00:53:29.925 Um, have you benchmarked this against the open AI solution?

982 00:53:30.025 --> 00:53:32.805 And does the framework also work, provide tools

983 00:53:32.945 --> 00:53:36.005 to do web crawling to collect relevant data for the query?

984 00:53:36.595 --> 00:53:39.325 Yeah, so great question. So, um, so I guess like the, the,

985 00:53:39.325 --> 00:53:42.325 the goal, so, um, uh, these different like open source

986 00:53:43.195 --> 00:53:46.245 deep research agents, um, they have, they've had different,

987 00:53:46.245 --> 00:53:47.245 like different goals.

988 00:53:47.745 --> 00:53:50.325 And so, uh, uh, the goal of ours was not so much

989 00:53:50.385 --> 00:53:54.365 to like reproduce the, um, specific like benchmark that, um,

990 00:53:54.585 --> 00:53:58.605 OpenAI, um, uh, ran theirs on,

991 00:53:58.985 --> 00:54:00.045 but to sort of like produce a

992 00:54:00.045 --> 00:54:01.125 system that's like understandable.

993 00:54:01.865 --> 00:54:04.005 We can use it for like, for teaching purposes,

994 00:54:04.465 --> 00:54:06.645 but I recommend, so check out the, um, the,

995 00:54:06.945 --> 00:54:09.565 the deep research agent from hugging face where

996 00:54:09.565 --> 00:54:13.445 that actually was one of their primary motivations was to,

997 00:54:13.985 --> 00:54:15.805 um, to achieve like a similar number

998 00:54:16.025 --> 00:54:18.325 or even exceed the benchmark, uh, uh, which they did.

999 00:54:19.605 --> 00:54:21.165 I think it's also just interesting to like,

1000 00:54:21.165 --> 00:54:24.245 compare different architectures for, for research agents.

1001 00:54:26.175 --> 00:54:28.225 Okay. Does this framework also provide tools

1002 00:54:28.285 --> 00:54:30.545 to do the web crawling to collect relevant data?

1003 00:54:31.325 --> 00:54:34.745 So, um, uh, yes, like, so we've got the, um, uh,

1004 00:54:34.745 --> 00:54:37.865 we've got like sort of the, the tools to, to call a number

1005 00:54:37.865 --> 00:54:40.065 of different web crawling, uh, services.

1006 00:54:40.845 --> 00:54:43.025 Um, I, I think we're still sort of like adding that

1007 00:54:43.045 --> 00:54:45.065 as like a dynamic, uh, tool call.

1008 00:54:45.645 --> 00:54:47.625 Uh, but I think that's something for the near future.

1009 00:54:48.165 --> 00:54:51.625 But um, you can just say, okay, here is like a domain name.

1010 00:54:51.785 --> 00:54:54.105 I want to, I wanna sort of, um, fetch all

1011 00:54:54.105 --> 00:54:57.865 of my data from this domain and then it'll call fire crawl

1012 00:54:57.925 --> 00:54:59.145 or whatever service you're using

1013 00:54:59.565 --> 00:55:03.265 and then pull that in, index that, and then run your query.

1014 00:55:05.805 --> 00:55:08.865 So, great question. And I think that brings us,

1015 00:55:08.925 --> 00:55:11.065 um, just up to about time. Yeah,

1016 00:55:11.295 --> 00:55:12.745 Yeah, right at the top of the hour.

1017 00:55:12.925 --> 00:55:15.905 So thank you guys. Thank you all so much for joining today.

1018 00:55:16.365 --> 00:55:19.425 Uh, Stefan's put his information on the screen here if

1019 00:55:19.425 --> 00:55:20.545 you have questions for him.

1020 00:55:20.925 --> 00:55:22.585 Uh, we also have office hours.

1021 00:55:22.965 --> 00:55:25.345 Um, if you want to, um,

1022 00:55:25.645 --> 00:55:28.625 if you want a specialized one-on-one session, uh,

1023 00:55:28.885 --> 00:55:30.425 the QR code for that is right here.

1024 00:55:30.965 --> 00:55:33.785 Um, and we also have a workshop coming up in

1025 00:55:33.785 --> 00:55:34.865 person in Palo Alto.

1026 00:55:35.125 --> 00:55:37.865 Uh, if you are based in the Bay Area, which I did see some

1027 00:55:37.865 --> 00:55:41.145 of you are so with opening, you register for that.

1028 00:55:41.865 --> 00:55:43.425 Um, so thank you all for joining today

1029 00:55:43.685 --> 00:55:46.505 and, uh, we look forward to seeing you at our next webinar.

1030 00:55:47.055 --> 00:55:48.105 Have a good rest of your day.

1031 00:55:48.365 --> 00:55:49.425 Thanks everyone for coming

1032 00:55:49.485 --> 00:55:52.225 and hope to see you, uh, in March for our,

1033 00:55:52.225 --> 00:55:53.465 for our workshop with OpenAI.

1034 00:55:53.655 --> 00:55:54.305 Okay. Take care.

Meet the Speaker

Join the session for live Q&A with the speaker

Stefan Webb
Developer Advocate, Zilliz
Stefan Webb is a Developer Advocate at Zilliz, where he advocates for the open-source vector database, Milvus. Prior to this, he spent three years in industry as an Applied ML Researcher at Twitter and Meta, collaborating with product teams to tackle their most complex challenges. Stefan holds a PhD from the University of Oxford and has published papers at prestigious machine learning conferences such as NeurIPS, ICLR, and ICML. He is passionate about generative AI and is eager to leverage his deep technical expertise to contribute to the open-source community.

What Makes "Deep Research"? A Dive into AI Agents

About this webinar:

Topics covered:

Meet the Speaker

AI Assistant