What Are AI Hallucinations?
AI hallucinations occur when the answer provided by the language model (like ChatGPT) appears reasonable but is entirely false. These inaccurate answers are not a result of the language model intentionally lying but rather because the training data used for the model does not contain the necessary information to answer correctly. For instance, any data generated after September 2021 was not included in ChatGPT's training, so ChatGPT cannot respond accurately if you inquire about such data.
While in some cases, ChatGPT will indicate that it doesn't have the answer, there are instances where it may generate a wholly fabricated response.
The AI Hallucination Problem Explained
An AI hallucination occurs when an AI model generates incorrect information but presents it as if it were a fact. These hallucinations are often a result of limitations or biases in the training data and algorithms, leading to the production of inaccurate and potentially harmful content.
Why AI Hallucinates
AI hallucinations can occur for several reasons, including:
- Insufficient, outdated, or low-quality training data: An AI model is only as good as the data it's trained on. If the AI tool doesn't understand your prompt or lacks sufficient information, it may rely on the limited dataset it has been trained on to generate a response, even if it's inaccurate.
- Overfitting: An AI model trained on a limited dataset may memorize the inputs and corresponding outputs. As a result, it may struggle to generalize to new data, leading to AI hallucinations effectively.
Additionally, other factors can contribute to AI hallucinations, such as using idioms or slang expressions that the AI model hasn't been trained on and adversarial attacks where prompts are intentionally designed to confuse the AI.
AI Hallucinations Examples
Examples of AI hallucinations include this example when working with ChatGPT. When we asked ChatGPT the question “Does Hugging Face support OpenAI Whisper?” it admitted its limitation and gave us an accurate answer, that it didn't have the data, but the answer really wasn't useful.
As of my knowledge cutoff of September 2021, Hugging Face has not announced support for OpenAI Whisper. However, it is always possible that they may add support in the future. OpenAI Whisper is a new language model from OpenAI, and it may take some time for other companies and developers to integrate it into their platforms and products.
When we ask the same question via a chatbot that stores the latest content (Github repo, docs, readmes, etc) in a vector database, we get a totally different, and accurate, result;
Yes, Hugging Face supports OpenAI Whisper. The FlaxWhisperModel class is available in the transformers library, which is a wrapper around the OpenAI Whisper model. You can use this class to load the pre-trained OpenAI Whisper model and use it for various natural language processing tasks such as text classification, question answering, and text generation.
How can you reduce AI Hallucinations
You can do a few things to help reduce or eliminate AI Hallucinations. The first is to give the LLM specific instructions when looking for your answer. For example, you can ask it to respond only with “yes” or “no,” or you can ask it to provide references to help you ensure accuracy, or you can also play around with the temperature settings.
In addition, you can provide it with the actual data to formulate the answer. This is done by converting your data into vector embeddings and storing them in a vector database. In most cases, there is a chatbot front end that the user interacts with. The users ask their questions; the question is then converted into a vector embedding. You then do an Approximate Nearest Neighbor search to find semantically similar items then present this data to the LLM to generate an accurate response.
Does Zilliz Help with AI Hallucinations?
Zilliz Cloud (and Milvus) helps with AI Hallucinations by storing and querying a knowledge base that has been converted into vector embeddings. OSSChat is a sample application that demonstrates how a vector database can be used to reduce these hallucinations. Here are some more resources on how you can use Zilliz to reduce hallucinations: