Skip to main content

Guardrail Types

NeMo Guardrails applies guardrails at multiple stages of the LLM interaction pipeline. Each rail type serves a specific purpose and operates at a distinct point in the conversation flow.
Programmable Guardrails Flow

Overview of Rail Types

The five main types of guardrails are:
StageRail TypeCommon Use Cases
Before LLMInput railsContent safety, jailbreak detection, topic control, PII masking
RAG pipelineRetrieval railsDocument filtering, chunk validation
ConversationDialog railsFlow control, guided conversations
Tool callsExecution railsAction input/output validation
After LLMOutput railsResponse filtering, fact checking, sensitive data removal
Input and Output rails are the most commonly used types. Start with these before implementing more advanced rail types.

1. Input Rails

Input rails are applied to user input before the LLM is invoked. They can validate, reject, or transform user messages.

Use Cases

Jailbreak Detection

Detect and block attempts to bypass safety measures or manipulate the LLM into harmful behavior.

Content Safety

Check user inputs for harmful, offensive, or inappropriate content before processing.

PII Masking

Detect and mask personally identifiable information like emails, phone numbers, or SSNs.

Topic Control

Ensure user requests stay within allowed topic boundaries.

Configuration Example

rails:
  input:
    flows:
      - check jailbreak
      - mask sensitive data on input
      - check input toxicity

Colang Example

define user ask about politics
  "What do you think about the government?"
  "Which party should I vote for?"

define flow politics input rail
  user ask about politics
  bot refuse to respond
  stop
Input rails can reject the entire request by calling stop, preventing any further processing.

2. Retrieval Rails

Retrieval rails operate in RAG (Retrieval Augmented Generation) scenarios, filtering and validating retrieved chunks before they’re used to prompt the LLM.

Use Cases

  • Chunk Relevance: Ensure retrieved documents are actually relevant to the query
  • PII Detection: Mask or remove sensitive data from retrieved content
  • Source Validation: Verify chunks come from trusted sources
  • Content Filtering: Remove inappropriate or outdated information

Configuration Example

rails:
  retrieval:
    flows:
      - check retrieval relevance
      - mask pii in chunks

Custom Retrieval Action

# actions.py
from typing import Optional

async def check_retrieval_relevance(
    context: Optional[dict] = None
) -> bool:
    """Filter out irrelevant retrieved chunks."""
    relevant_chunks = context.get("relevant_chunks", [])
    
    # Filter chunks with low relevance scores
    filtered_chunks = [
        chunk for chunk in relevant_chunks 
        if chunk.get("score", 0) > 0.7
    ]
    
    context["relevant_chunks"] = filtered_chunks
    return True

3. Dialog Rails

Dialog rails influence how the conversation flows by operating on canonical form messages. They determine if an action should execute, if the LLM should generate the next step, or if a predefined response should be used.

Use Cases

Define specific paths the conversation should follow:
define flow greeting
  user express greeting
  bot express greeting
  bot ask how are you
  
  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

Configuration Example

rails:
  dialog:
    flows:
      - enforce greeting flow
      - check topic boundaries

4. Execution Rails

Execution rails are applied to tool/action calls, validating both the input parameters and the output results before they’re used in the conversation.

Use Cases

  • Input Validation: Ensure action parameters are safe and well-formed
  • Output Sanitization: Filter sensitive data from action results
  • Authorization: Verify the user has permission to execute the action
  • Rate Limiting: Control how often certain actions can be called

Configuration Example

rails:
  execution:
    flows:
      - validate action inputs
      - sanitize action outputs

Example: Database Query Validation

# actions.py
import re
from nemoguardrails.actions import action

@action(name="execute_sql_query")
async def execute_sql_query(query: str, context: dict = None):
    """Execute SQL query with validation."""
    
    # Input rail: validate query is read-only
    dangerous_keywords = ["DROP", "DELETE", "UPDATE", "INSERT"]
    if any(keyword in query.upper() for keyword in dangerous_keywords):
        raise ValueError("Only SELECT queries are allowed")
    
    # Execute query (simplified)
    results = await run_query(query)
    
    # Output rail: remove sensitive columns
    filtered_results = remove_sensitive_columns(results)
    
    return filtered_results
Security Best Practice: Always isolate authentication information from the LLM. Never pass credentials as action parameters that the LLM might see.

5. Output Rails

Output rails are applied to the LLM-generated response before it’s returned to the user. They can reject, modify, or enhance the output.

Use Cases

Fact Checking

Verify factual claims in the response against trusted sources.

Hallucination Detection

Detect when the LLM generates information not grounded in context.

Content Moderation

Check for harmful, biased, or inappropriate content in responses.

PII Removal

Strip any accidentally generated personal information.

Configuration Example

rails:
  output:
    flows:
      - self check facts
      - self check hallucination
      - check output toxicity
      - remove pii from output

Colang Example

define flow output moderation
  bot ...
  $allowed = execute output_moderation_check
  
  if not $allowed
    bot inform cannot answer
    stop

Use Case Matrix

Different use cases benefit from different combinations of rail types:
Use CaseInputRetrievalDialogExecutionOutput
Content Safety
Jailbreak Protection
Topic Control
PII Detection
Knowledge Base / RAG
Agentic Security
Custom Rails

Combining Multiple Rails

Rails of different types work together to provide comprehensive protection:
models:
  - type: main
    engine: openai
    model: gpt-4o-mini

rails:
  # Stage 1: Validate user input
  input:
    flows:
      - check jailbreak
      - mask sensitive data on input
  
  # Stage 2: Filter retrieved content (RAG)
  retrieval:
    flows:
      - check retrieval relevance
  
  # Stage 3: Guide conversation flow
  dialog:
    flows:
      - enforce topic boundaries
  
  # Stage 4: Validate tool execution
  execution:
    flows:
      - validate action inputs
  
  # Stage 5: Check output quality
  output:
    flows:
      - self check facts
      - check output toxicity

Rail Execution Order

Rails are executed in this order for each conversation turn:
  1. Input rails → Process user message
  2. Dialog rails → Determine conversation flow
  3. Retrieval rails → Filter RAG chunks (if applicable)
  4. Execution rails → Validate tool calls (if applicable)
  5. Output rails → Validate bot response
Any rail can call stop to halt processing immediately. This is useful for rejecting inappropriate requests or responses.

Next Steps

Colang DSL

Learn how to write rail definitions using Colang

Guardrails Library

Explore pre-built guardrails you can use immediately