To detect hallucinations in RAG-generated answers, developers can use techniques that systematically compare generated claims against retrieved source material. These methods focus on ensuring factual consistency and grounding in the provided context. Below are three practical approaches:
1. Natural Language Inference (NLI) for Entailment Checks
NLI models evaluate whether a generated claim is logically supported by the retrieved text. For example, a claim like "The Eiffel Tower was completed in 1889" can be verified by running an NLI model (e.g., DeBERTa or RoBERTa) on the claim and the retrieved context. The model classifies the relationship as entailment, contradiction, or neutral. If the result is entailment, the claim is validated. Tools like Hugging Face’s transformers
library provide pre-trained NLI models for this purpose. However, NLI may miss nuanced or indirect support, so combining it with other methods improves reliability.
2. Embedding-Based Semantic Similarity Analysis This technique computes similarity scores between generated sentences and retrieved passages using embeddings (e.g., Sentence-BERT). For each claim in the answer, calculate cosine similarity against all retrieved chunks. If the highest similarity score falls below a threshold (e.g., 0.7), the claim is flagged for review. For instance, if the answer states, "Solar panels convert sunlight into electricity with 90% efficiency," but no retrieved text mentions efficiency percentages, the low similarity score would indicate a potential hallucination. This method is efficient but requires careful threshold tuning to balance precision and recall.
3. Fact Segmentation and Source Attribution Break the generated answer into individual facts (e.g., using spaCy for entity/relation extraction) and map each to specific sections of the retrieved text. For example, if the answer claims, "The Treaty of Versailles was signed in 1919," a tool can check if "1919" appears in the retrieved documents. If no matching date or event is found, the claim is flagged. Some RAG systems implement citation mechanisms, where the generator explicitly references source passages. Automated scripts can then validate citations by checking if the cited text actually supports the claim. This approach works best when the RAG pipeline includes structured citation output.
These techniques can be combined for robust validation. For instance, use embedding similarity to filter low-confidence claims, then apply NLI for deeper verification. Implementing such checks helps reduce hallucinations while maintaining the efficiency of RAG systems.