An LLM Powered Text to Image Prompt Generation with Milvus
The Background Story
Since encountering the first open-source image-generating AI systems, I fell in love with their potential to create visually appealing images from text. I've also seen that people who use this technology get a significant advantage—they have more time to be creative and generate better prompts than I can.
I couldn't shake this feeling. So, I started searching through webpages to find cool images and the prompts that made them. Then, I used those prompts to make my own images. It helped me get better prompts, but it took a lot of time. And even then, I still struggled because I couldn't come up with new prompts quickly and independently.
I didn't totally figure everything out by myself because I still needed help. But guess what? I found a way to speed up my process. I downloaded millions of prompts and put them into a Milvus vector database. Then, I created a way to fetch similar results based on simple prompts entered into a UI.
These prompts resulted in amazing images. One user who tested the prompts even found that they produced better results than what he was doing before with his regular prompts. He then combined his negative prompts with the system I created to generate the images he wanted. Even without the negative prompts, he found that he could use the system to create high quality images.
Both images are the same seed and use the same negative Prompt.
Left is the Prompt Quill Prompt
Still same seed but no negative Prompt
Both images drift away in quality, but the left Image keeps the image composition as well as pose while the image on the right drifts not just by quality but also by pose and background.
So we can see how the more detailed Prompt of prompt quill did help to keep the image close to what it was meant to look like
How Milvus Powers My Text-to-Image Prompt Generation
I created scripts to fetch and clean prompts from multiple sources. Then, I load the cleaned prompts into a vector database. Initially, I tried pgvector but found that it was too slow. After careful exploration, I chose Milvus for performance reasons, it was five times faster than pgvector with almost the same code.
Once the data is available in the Milvus Vector Store, the fun can start. I started by just asking the LLM to generate some nice prompts. It didn’t work right off the bat. The context and input wouldn’t match. So, I played around until I found that I needed to give the LLM some instructions telling it that it was a prompt engineer and added some example conversation history. This was enough to get it to start producing wonderful images.
What’s more is that I can run all of this on my local machine, in part because Milvus is able to run vector search so quickly. Most of the latency comes from running the embedding model and LLM. The vector search is so fast that the GPU has no real pause between the embedding vector creation and starting to produce the final output.
And we’re not done yet. There’s plenty more to be done, and now that this is available, people are adding prompts and new images daily.
Here is a diagram of the whole process so far:
Conclusion
By building Prompt Quill, I found myself with loads of great prompts in way less time than before. I also realized that the prompts my system makes are more robust than the ones people make by hand for special models. Those models need careful handling and special negative prompts to make good images. Negative prompts also tend to enhance the output of this system but the amount of change to the non-negative prompted images is not as large as with some of those hand-crafted prompts.
Roadmap
The next step is to add the same function for negative prompts. Negative prompts have a positive influence on how prompts can be used to generate images. In the future, I’ll be adding a second step to provide negative prompts as well. We’ll use the same process currently used to produce the prompts by comparing it to the generated prompts in the system.
I have published a simple UI to generate the prompts assuming the vector store is available. I will upload vector store data very soon and add the links to my GitHub. I would like to run this online so that people who don’t have the GPU resources will be able to get nice prompts too, if you feel you could help by sponsoring a long-term hosting solution please contact me.
A few examples the system produced the prompts for:
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free