Hallucination detection helps identify when your bot generates responses that are inconsistent or potentially fabricated.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/NVIDIA-NeMo/Guardrails/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The hallucination detection guardrail uses a self-consistency approach:- Generates multiple responses to the same prompt
- Compares responses for agreement
- Flags potential hallucinations when responses diverge
- Detecting fabricated information
- Identifying low-confidence responses
- Improving reliability in critical applications
- Providing warnings about uncertain answers
Quick Start
How It Works
The hallucination detector:- Takes the original prompt used to generate the bot’s response
- Generates 2 additional responses with temperature=1.0
- Uses an LLM to check if all responses agree
- Returns
Trueif hallucination detected,Falseotherwise
Configuration
Basic Configuration
config.yml
Blocking Mode
Block responses when hallucinations are detected:flows.co
Warning Mode
Provide a warning instead of blocking:config.yml
flows.co
Two Detection Modes
1. Blocking Mode (self check hallucination)
Blocks the response entirely if hallucination is detected:
flows.co
2. Warning Mode (hallucination warning)
Adds a disclaimer to potentially hallucinated responses:
flows.co
- “The previous answer is prone to hallucination and may not be accurate. Please double check the answer using additional sources.”
- “The above response may have been hallucinated, and should be independently verified.”
LLM Requirements
Required features:- Support for the
nparameter (to generate multiple completions) - Beam search or similar multi-completion capability
- OpenAI models (GPT-3.5, GPT-4)
- Models with compatible
nparameter
- Most non-OpenAI models
- Models without multi-completion support
n parameter, hallucination detection will return False (no hallucination detected) and log a warning.
Context Requirements
The hallucination detector needs:$bot_message- The bot’s response to check$_last_bot_prompt- The original prompt (automatically tracked)
False.
Behavior
With Rails Exceptions
config.yml
SelfCheckHallucinationRailException when hallucination is detected in blocking mode.
Without Rails Exceptions
In blocking mode: Bot says “I don’t know the answer to that” and aborts. In warning mode: Bot adds a disclaimer about potential hallucination.Activating Detection
Blocking Mode
Set$check_hallucination = True:
flows.co
Warning Mode
Set$hallucination_warning = True:
flows.co
Custom Flows
Create custom hallucination handling:flows.co
Agreement Checking
The detector prompts the LLM to determine agreement:prompts.yml:
prompts.yml
Performance Considerations
Best practices:- Use selectively for important responses
- Consider using warning mode instead of blocking
- Only enable for general knowledge questions (not factual RAG responses)
- Monitor API costs carefully
Temperature Settings
Extra responses use high temperature:Implementation Details
The hallucination flows are defined in:/nemoguardrails/library/hallucination/flows.co/nemoguardrails/library/hallucination/actions.py
SelfCheckHallucinationAction- Performs self-consistency check
Use Cases
Good use cases:- General knowledge questions
- Creative or opinion-based responses
- Uncertain or ambiguous queries
- Non-critical information
- RAG-based factual responses (use fact checking instead)
- Time-sensitive information
- Deterministic computations
- Simple lookups