Introduction
My name is Tim Spann and I work at Zilliz on developer advocacy for the amazing Open Source project, Milvus. Open Source and helping developers, engineers and cool projects has been my passion for a number of years covering things like Hadoop, Spark, Kafka, NiFi, Flink, Iceberg, Kudu, HBase, Hive and Spring.
My Medium posts: https://medium.com/@tspann
My YouTube Channel: https://www.youtube.com/@FLaNK-Stack
New Challenges
The last two years I have been working on the intersection of streaming and AI and this is where I first saw the importance of a database for AI that could store and query any type of data in any mode that is needed.
I have been working with generative AI, but I needed to be where the future is going and where the new data processing is happening. Unstructured data processing is needed now and I need to spread the word. This is the place. With Milvus, Towhee, Attu and integrations with Kafka and all the cool LlamaX frameworks, this is how to get it done. We need to build up a global group of unstructured data engineers and data superstars. I am so excited to continue this accelerated journey. I have been interested in machine learning, natural language processing and edge AI for nearly a decade.
- https://community.cloudera.com/t5/Community-Articles/Using-Sentiment-Analysis-and-NLP-Tools-With-HDP-2-x-and-HDF/ta-p/249102
- https://community.cloudera.com/t5/Community-Articles/Open-NLP-Example-Apache-NiFi-Processor/ta-p/249293
- https://community.cloudera.com/t5/Community-Articles/Creating-HTML-from-PDF-Excel-and-Word-Documents-using-Apache/ta-p/247968
More Than Vector Databases
Milvus alone is a powerful datastore and reason to want to work for Zilliz. This is just the start of a new paradigm shift for the next Generative AI-powered Data Revolution. The need for powerful, fast ways to do unstructured data processing and Vector ETL is already evident and growing. In the next few years, we will see a rise in unstructured data engineering and processing like we did with Spark, Flink and Kafka for structured and semistructured data.
The need to load logs, email, documents, slack messages, photos, images, videos, audio files and even more binary formats will transform industries. When I started with Big Data, we had to move a lot of JSON, CSV, XML, Relational Tables and structured data. We still have those files and we have them streaming, but we need our data available for similarity search and to be vectorized for fast access.
We will be building as many prompts as we build SQL statements. Many of these data formats will need to be used for the same applications. We can add JSON metadata along with our vectors for additional types of searching, while the lines between unstructured data and structured data becomes blurred as models and prompts require a federated view of data especially for live use cases.
I have already seen this for mass transit applications and this will move into all enterprise applications including IoT and fraud analytics.
The future has a lot more data, a huge need for unstructured data processing, a scalable open-source AI database that can handle the new data and an ever-increasing variety of AI Models.
It Takes a Team
I am very fortunate to have collaborated with a number of my coworkers before and was eager to work with them. I was also incredibly impressed with the everyone I spoke with before joining. This is an incredibly skilled, intelligent team with a deep background in what it takes to bring innovative technology to the mainstream. The future starts now, let’s dive in.
Community
Join me in the New York City area for meetups and other events.
I am also assisting many of the AI events in Princeton and work with StartupGrind Princeton and Trenton, Applied Generative AI and the NJ GAI Meetup.
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free