Why Dopple Labs Chose Zilliz Cloud over Pinecone for Secure and High-Performance Vector Searches

A Pinecone alternative
with granular control, effective scaling, and high performance
A Billion scale
vector data storage and retrieval
Open Source
for enhanced ML & VectorDB performance
I appreciated using the open standard evaluation benchmarks for machine learning in general; this is also true for vector databases. The ones that Zilliz often publicizes have been beneficial, and the fact that they are open is significant.
Sam Butler
About Dopple AI
Dopple Labs Inc. is the visionary force behind Dopple.AI, an innovative platform revolutionizing human-AI interactions. Available on iOS and Android, Dopple.AI enables users to create lifelike AI clones, or "Dopples," seamlessly integrating video, audio, and messaging for immersive experiences.
At its core, Dopple.AI leverages advanced Llama2-based LLM technology, where users interact with Dopples through chat threads across various devices. Whether created by Dopple Labs or users themselves, Dopples engage in lifelike conversations based on user inputs and prompts.
Recently, Dopple Labs introduced groundbreaking features such as image reactions, where emotion-rich images enhance user interactions with Dopples. Additionally, voice captioning and real-time audio streaming further elevate the audio-visual experience, fostering deeper engagement and connection.
As Dopple.AI continues to push the boundaries of AI-driven companionship, it remains at the forefront of redefining the way individuals interact with personalized AI clones.
The Challenges: Bringing memory to chatbot conversations
Dopple AI users demonstrate a deep understanding of the platform's AI characters, employing advanced techniques to shape their interactions. They utilize features like message editing and rerolling to guide conversations, showcasing their control over the dialogue and crafting personalized exchanges. Essentially, users act as "prompt engineers," skillfully constructing conversations with AI characters. They steer dialogues to align with their preferences and objectives through strategic prompts and edits, resulting in dynamic interactions.
The team at Dopple AI, led by Sam Butler, Director of Machine Learning, is able to build these kinds of features by using the Retrieval Augmented Generation (RAG) technique to implement a memory storage system by storing summaries of conversations. This involves taking a few messages for context and the main message as the one they want to store the memory about. They then use a different LLM to create a summarization of those messages. The resulting summary is embedded and stored in a vector database.
When a user submits a query, it is converted into an embedding used to search for similar embeddings in a vector database. This allows access to past conversations beyond the immediate context window of the prompt given to the LLM. By leveraging embeddings from previous interactions, the LLM gains long-term memory capabilities. For instance, if a user asks 'What is my pet fish’s name?' and the conversation about their pet fish occurred in the past and outside of the context window, they can convert that query into an embedding to retrieve that information from a vector database.
Re-roll to control the role-playing story line
Users have the flexibility to edit their most recent message, allowing them to refine their conversation with the LLM. If they receive a response they're not satisfied with, they can choose to "reroll" without altering their last message, prompting the LLM for a new response to explore different options. Additionally, users can revisit and modify their last message to influence the LLM's response, crafting their conversation step by step to align with their desired direction. This level of control is particularly valued by advanced users who have a clear objective in mind for the conversation. Conversely, novice or less frequent users may take a more passive role, allowing the conversation to unfold naturally. However, Dopple AI's core user base typically engages in active participation akin to embarking on a quest or engaging in role-playing scenarios, reflecting their intent to guide the conversation towards specific outcomes.
Each conversation summary is stored as a unique item in the database, allowing for efficient filtering based on user names. Summaries are generated by consolidating every three or four messages into one coherent summary, which is then seamlessly integrated into a vector database. This process continues indefinitely, ensuring a continuous accumulation of conversation memories. Memories are retained unless a user explicitly deletes a conversation thread, in which case the associated memories are also removed. However, if a conversation is intended to be revisited or continued in the future, the memories remain accessible within the vector database.
An intriguing aspect of this RAG implementation is that many of these characters and media references are timeless and frequently present in their training data, automated fact-checking becomes less critical. This is because users prioritize entertainment value over factual accuracy
The Solutions: Zilliz Cloud for Secure and High-performance Vector Searches
Sam Butler also oversees the coordination between the ML team and the frontend teams responsible for implementing designs in their app and web platforms. One of their biggest challenges, like many in the industry, is staying abreast of the latest advances in models. With new models constantly emerging and the state of the art evolving, keeping up requires significant effort. This is where partnering with a managed service provider like Zilliz proves invaluable, allowing them to focus on their core product while leveraging Zilliz's expertise in database optimization.
They transitioned from Pinecone to Zilliz Cloud on GCP due to their need for large-scale retrieval and the scalability of their tool over time in relation to index size. While Pinecone offered managed services, it lacked the granular control and true effective scaling they required. Access to insights and data regarding performance metrics, such as compute allocation and consistent real-time performance as indexes expanded, was crucial. With the anticipation of having hundreds of millions to billions of data points within their vector indexes, they sought a solution that could effectively handle such scaling requirements, leading them to choose Zilliz Cloud to serve this use case.
After encountering challenges with Pinecone, Sam explored various benchmarks and leaderboards for different vector databases, eventually discovering Zilliz Cloud. The team at Dopple AI, particularly interested in benchmarking results, was excited about the discovery and eager to explore its potential benefits further.
What’s next for Dopple Labs?
Sam and his team recently enhanced their service by introducing a visual-audio experience. They began by integrating image reactions, providing each character with a diverse set of approximately 800 to 900 images depicting 30 emotions, each with several different versions. During inference, another LM determines the mood of the response, selecting a random image from the corresponding emotion category to ensure variety. Additionally, they introduced voice captioning and streaming characters from their LM inference provider to ElevenLabs for real-time audio streaming. This synchronized audio-visual experience displays emotional reaction images alongside the text as it appears in the app. And this is just the beginning, as they plan to add glide voice calls, moving images, and video. Eventually, users can make FaceTime calls with your Dopples to have real time conversations.