Why GPTCache?
Built on a strong and growing community.
Why GPTCache?
Storing LLM responses in a cache can significantly reduce the time it takes to retrieve the response, especially when it has been previously requested and is already present in the cache. Storing responses in a cache can improve the overall performance of your application.
Most LLM services charge fees based on a combination of the number of requests and token count. Caching LLM responses can reduce the number of API calls made to the service, translating into cost savings. Caching is particularly relevant when dealing with high traffic levels, where API call expenses can be substantial.
Caching LLM responses can improve the scalability of your application by reducing the load on the LLM service. Caching helps avoid bottlenecks and ensures that the application can handle a growing number of requests.
A semantic cache can be a valuable tool to help reduce costs during the development phase of an LLM (Language Model) app. An LLM application requires an LLM APIs connection even during development, which could become costly. GPTCache offers the same interface as LLM APIs and can store LLM-generated or mocked-up data. GPTCache helps verify your application's features without connecting to the LLM APIs or the network.
A semantic cache located closer to the user, reducing the time it takes to retrieve data from the LLM service. By reducing network latency, you can improve the overall user experience.
LLM services frequently enforce rate limits, which are constraints that APIs place on the number of times a user or client can access the server within a given timeframe. Hitting a rate limit means that additional requests will be blocked until a certain period has elapsed, leading to a service outage. With GPTCache, you can quickly scale to accommodate an increasing volume of queries, ensuring consistent performance as your application's user base expands.
Overall, developing a semantic cache for storing LLM responses can offer various benefits, including improved performance, reduced expenses, better scalability, customization, and reduced network latency.
GPTCache was built with a modular design to make it easy for users to customize their semantic cache. Each module has options for the users to choose from to fit their needs.
GPTCache works with your application, your preferred LLM (ChatGPT, langchain), a cache store (SQLite, PostgreSQL, MySQL, MariaDB, SQL Server ,and Oracle), and a vector store (FAISS, Milvus, Ziliz Cloud).