Guardrail Types

NeMo Guardrails applies guardrails at multiple stages of the LLM interaction pipeline. Each rail type serves a specific purpose and operates at a distinct point in the conversation flow.

Overview of Rail Types

The five main types of guardrails are:

Stage	Rail Type	Common Use Cases
Before LLM	Input rails	Content safety, jailbreak detection, topic control, PII masking
RAG pipeline	Retrieval rails	Document filtering, chunk validation
Conversation	Dialog rails	Flow control, guided conversations
Tool calls	Execution rails	Action input/output validation
After LLM	Output rails	Response filtering, fact checking, sensitive data removal

Input and Output rails are the most commonly used types. Start with these before implementing more advanced rail types.

1. Input Rails

Input rails are applied to user input before the LLM is invoked. They can validate, reject, or transform user messages.

Use Cases

Jailbreak Detection

Detect and block attempts to bypass safety measures or manipulate the LLM into harmful behavior.

Content Safety

Check user inputs for harmful, offensive, or inappropriate content before processing.

PII Masking

Detect and mask personally identifiable information like emails, phone numbers, or SSNs.

Topic Control

Ensure user requests stay within allowed topic boundaries.

Configuration Example

rails:
  input:
    flows:
      - check jailbreak
      - mask sensitive data on input
      - check input toxicity

Colang Example

define user ask about politics
  "What do you think about the government?"
  "Which party should I vote for?"

define flow politics input rail
  user ask about politics
  bot refuse to respond
  stop

Input rails can reject the entire request by calling stop, preventing any further processing.

2. Retrieval Rails

Retrieval rails operate in RAG (Retrieval Augmented Generation) scenarios, filtering and validating retrieved chunks before they’re used to prompt the LLM.

Use Cases

Chunk Relevance: Ensure retrieved documents are actually relevant to the query
PII Detection: Mask or remove sensitive data from retrieved content
Source Validation: Verify chunks come from trusted sources
Content Filtering: Remove inappropriate or outdated information

Configuration Example

rails:
  retrieval:
    flows:
      - check retrieval relevance
      - mask pii in chunks

Custom Retrieval Action

# actions.py
from typing import Optional

async def check_retrieval_relevance(
    context: Optional[dict] = None
) -> bool:
    """Filter out irrelevant retrieved chunks."""
    relevant_chunks = context.get("relevant_chunks", [])
    
    # Filter chunks with low relevance scores
    filtered_chunks = [
        chunk for chunk in relevant_chunks 
        if chunk.get("score", 0) > 0.7
    ]
    
    context["relevant_chunks"] = filtered_chunks
    return True

3. Dialog Rails

Dialog rails influence how the conversation flows by operating on canonical form messages. They determine if an action should execute, if the LLM should generate the next step, or if a predefined response should be used.

Use Cases

Conversational Flows
Topic Restrictions
Authentication Flows

Define specific paths the conversation should follow:

define flow greeting
  user express greeting
  bot express greeting
  bot ask how are you
  
  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

Control which topics the bot can discuss:

define user ask about stock market
  "Which stock should I invest in?"
  "Would this stock 10x over the next year?"

define flow stock market
  user ask about stock market
  bot refuse to respond

Enforce authentication before sensitive operations:

define flow check authentication
  user request sensitive information
  if not $user_authenticated
    bot ask for credentials
    bot perform authentication
  bot provide sensitive information

Configuration Example

rails:
  dialog:
    flows:
      - enforce greeting flow
      - check topic boundaries

4. Execution Rails

Execution rails are applied to tool/action calls, validating both the input parameters and the output results before they’re used in the conversation.

Use Cases

Input Validation: Ensure action parameters are safe and well-formed
Output Sanitization: Filter sensitive data from action results
Authorization: Verify the user has permission to execute the action
Rate Limiting: Control how often certain actions can be called

Configuration Example

rails:
  execution:
    flows:
      - validate action inputs
      - sanitize action outputs

Example: Database Query Validation

# actions.py
import re
from nemoguardrails.actions import action

@action(name="execute_sql_query")
async def execute_sql_query(query: str, context: dict = None):
    """Execute SQL query with validation."""
    
    # Input rail: validate query is read-only
    dangerous_keywords = ["DROP", "DELETE", "UPDATE", "INSERT"]
    if any(keyword in query.upper() for keyword in dangerous_keywords):
        raise ValueError("Only SELECT queries are allowed")
    
    # Execute query (simplified)
    results = await run_query(query)
    
    # Output rail: remove sensitive columns
    filtered_results = remove_sensitive_columns(results)
    
    return filtered_results

Security Best Practice: Always isolate authentication information from the LLM. Never pass credentials as action parameters that the LLM might see.

5. Output Rails

Output rails are applied to the LLM-generated response before it’s returned to the user. They can reject, modify, or enhance the output.

Use Cases

Fact Checking

Verify factual claims in the response against trusted sources.

Hallucination Detection

Detect when the LLM generates information not grounded in context.

Content Moderation

Check for harmful, biased, or inappropriate content in responses.

PII Removal

Strip any accidentally generated personal information.

Configuration Example

rails:
  output:
    flows:
      - self check facts
      - self check hallucination
      - check output toxicity
      - remove pii from output

Colang Example

define flow output moderation
  bot ...
  $allowed = execute output_moderation_check
  
  if not $allowed
    bot inform cannot answer
    stop

Use Case Matrix

Different use cases benefit from different combinations of rail types:

Use Case	Input	Retrieval	Dialog	Execution	Output
Content Safety	✅				✅
Jailbreak Protection	✅
Topic Control	✅		✅
PII Detection	✅	✅			✅
Knowledge Base / RAG		✅			✅
Agentic Security				✅
Custom Rails	✅	✅	✅	✅	✅

Combining Multiple Rails

Rails of different types work together to provide comprehensive protection:

models:
  - type: main
    engine: openai
    model: gpt-4o-mini

rails:
  # Stage 1: Validate user input
  input:
    flows:
      - check jailbreak
      - mask sensitive data on input
  
  # Stage 2: Filter retrieved content (RAG)
  retrieval:
    flows:
      - check retrieval relevance
  
  # Stage 3: Guide conversation flow
  dialog:
    flows:
      - enforce topic boundaries
  
  # Stage 4: Validate tool execution
  execution:
    flows:
      - validate action inputs
  
  # Stage 5: Check output quality
  output:
    flows:
      - self check facts
      - check output toxicity

Rail Execution Order

Rails are executed in this order for each conversation turn:

Input rails → Process user message
Dialog rails → Determine conversation flow
Retrieval rails → Filter RAG chunks (if applicable)
Execution rails → Validate tool calls (if applicable)
Output rails → Validate bot response

Any rail can call stop to halt processing immediately. This is useful for rejecting inappropriate requests or responses.

Guardrail Types

Guardrail Types

Overview of Rail Types

1. Input Rails

Use Cases

Jailbreak Detection

Content Safety

PII Masking

Topic Control

Configuration Example

Colang Example

2. Retrieval Rails

Use Cases

Configuration Example

Custom Retrieval Action

3. Dialog Rails

Use Cases

Configuration Example

4. Execution Rails

Use Cases

Configuration Example

Example: Database Query Validation

5. Output Rails

Use Cases

Fact Checking

Hallucination Detection

Content Moderation

PII Removal

Configuration Example

Colang Example

Use Case Matrix

Combining Multiple Rails

Rail Execution Order

Next Steps

Colang DSL

Guardrails Library

Documentation Index

​Guardrail Types

​Overview of Rail Types

​1. Input Rails

​Use Cases

Jailbreak Detection

Content Safety

PII Masking

Topic Control

​Configuration Example

​Colang Example

​2. Retrieval Rails

​Use Cases

​Configuration Example

​Custom Retrieval Action

​3. Dialog Rails

​Use Cases

​Configuration Example

​4. Execution Rails

​Use Cases

​Configuration Example

​Example: Database Query Validation

​5. Output Rails

​Use Cases

Fact Checking

Hallucination Detection

Content Moderation

PII Removal

​Configuration Example

​Colang Example

​Use Case Matrix

​Combining Multiple Rails

​Rail Execution Order

​Next Steps

Colang DSL

Guardrails Library

Guardrail Types

Overview of Rail Types

1. Input Rails

Use Cases

Configuration Example

Colang Example

2. Retrieval Rails

Use Cases

Configuration Example

Custom Retrieval Action

3. Dialog Rails

Use Cases

Configuration Example

4. Execution Rails

Use Cases

Configuration Example

Example: Database Query Validation

5. Output Rails

Use Cases

Configuration Example

Colang Example

Use Case Matrix

Combining Multiple Rails

Rail Execution Order

Next Steps