Skip to main content

Architecture Overview

NeMo Guardrails uses an event-driven runtime architecture to process conversations through multiple stages of guardrails. Understanding this architecture helps you build more effective and efficient guardrails.
Architecture Overview

High-Level Architecture

The NeMo Guardrails library acts as an intermediary layer between your application code and LLM requests/responses:
  1. Application sends a user message to Guardrails
  2. Guardrails applies input rails, dialog rails, and potentially retrieval/execution rails
  3. Guardrails calls the LLM when needed
  4. Guardrails applies output rails to the response
  5. Guardrails returns the validated response to Application

Core Components

RailsConfig

The RailsConfig class is the central configuration object that defines:
LLM and embedding model configurations:
from nemoguardrails.rails.llm.config import Model, RailsConfig

config = RailsConfig(
    models=[
        Model(
            type="main",
            engine="openai",
            model="gpt-4o-mini"
        ),
        Model(
            type="embeddings",
            engine="openai",
            model="text-embedding-ada-002"
        )
    ]
)

LLMRails

The LLMRails class is the main entry point for using guardrails. It:
  • Initializes the runtime based on the Colang version (1.0 or 2.x)
  • Loads and registers all actions
  • Manages the conversation state
  • Orchestrates the guardrails processing pipeline
from nemoguardrails import LLMRails, RailsConfig

# Initialize
config = RailsConfig.from_path("./config")
rails = LLMRails(config, verbose=True)

# Use
response = rails.generate(
    messages=[{"role": "user", "content": "Hello!"}]
)

Key Methods

Main method for getting LLM responses with guardrails applied:
# Sync version
response = rails.generate(
    messages=[{"role": "user", "content": "Hello"}]
)

# Async version
response = await rails.generate_async(
    messages=[{"role": "user", "content": "Hello"}]
)
Lower-level method that returns the full event stream:
events = await rails.generate_events_async([
    {"type": "UtteranceUserActionFinished", "final_transcript": "Hello"}
])
Register custom actions dynamically:
async def my_action():
    return "result"

rails.register_action(my_action, name="my_action")

Runtime (Event-Driven Engine)

The runtime is the core event processing engine. There are two implementations:

RuntimeV1_0

Runtime for Colang 1.0:
  • Flows are active by default
  • Uses pattern matching for user/bot messages
  • Simpler, more implicit behavior

RuntimeV2_x

Runtime for Colang 2.0:
  • Explicit flow activation
  • More control over event handling
  • Supports advanced features like the ... operator
Both runtimes:
  • Process events in an async event loop
  • Execute actions and flows
  • Generate LLM prompts and parse responses
  • Maintain conversation state

The Guardrails Processing Pipeline

Here’s what happens when a user message is processed:

Stage 1: Generate Canonical User Message

1

Receive User Utterance

An UtteranceUserActionFinished event is created with the user’s text:
{
    "type": "UtteranceUserActionFinished",
    "final_transcript": "Hello, how are you?"
}
2

Apply Input Rails

Any configured input rails are executed to validate/transform the input.
3

Generate User Intent

The generate_user_intent action:
  • Performs vector search on user message examples
  • Includes top 5 matches in the prompt
  • Asks the LLM to generate the canonical form
define flow generate user intent
  event UtteranceUserActionFinished(final_transcript="...")
  execute generate_user_intent
4

Create UserIntent Event

A UserIntent event is generated:
{
    "type": "UserIntent",
    "intent": "user express greeting"
}

Stage 2: Decide Next Steps

Once the UserIntent event exists, the runtime determines what happens next.
If a flow matches, it executes directly:
define flow greeting
  user express greeting  # Matches!
  bot express greeting   # Execute this next
Next steps can be:
  1. Bot Message (BotIntent event) → Generate utterance
  2. Action Call (StartInternalSystemAction event) → Execute action

Stage 3: Execute Actions (if needed)

When an action is triggered:
1

Start Action

StartInternalSystemAction event is created
2

Apply Execution Rails

Validate action inputs if execution rails are configured
3

Execute Action

The Python function is called (async, non-blocking)
4

Apply Execution Rails

Validate action outputs
5

Finish Action

InternalSystemActionFinished event is created with the result

Stage 4: Generate Bot Utterance

When a BotIntent event is generated:
1

Retrieve Context (RAG)

If a knowledge base is configured:
define extension flow generate bot message
  priority 100
  bot ...
  execute retrieve_relevant_chunks
  execute generate_bot_message
The retrieve_relevant_chunks action:
  • Searches the knowledge base
  • Applies retrieval rails to filter chunks
  • Adds relevant chunks to the prompt context
2

Generate Utterance

The generate_bot_message action:
  • Performs vector search on bot message examples
  • Includes top 5 matches in the prompt
  • Includes retrieved chunks (if any)
  • Asks the LLM to generate the response
3

Apply Output Rails

Configured output rails validate the response:
define flow self check facts
  bot ...
  $check = execute fact_check
  if not $check
    bot inform cannot answer
    stop
4

Create StartUtteranceBotAction

Final event is created with the bot’s response

Complete Event Stream Example

Here’s a real event stream for processing “Hello”:
[
    # 1. User input
    {
        "type": "UtteranceUserActionFinished",
        "final_transcript": "Hello"
    },
    
    # 2. Canonical form generated
    {
        "type": "UserIntent",
        "intent": "user express greeting"
    },
    
    # 3. Bot intent decided
    {
        "type": "BotIntent",
        "intent": "bot express greeting"
    },
    
    # 4. Bot utterance generated
    {
        "type": "StartUtteranceBotAction",
        "script": "Hello there! How can I help you today?"
    }
]

Async-First Design

NeMo Guardrails is built with async/await from the ground up:

Why Async?

Better Concurrency

Multiple users can be served simultaneously. While one request waits for an LLM response, others continue processing.

Non-Blocking I/O

LLM calls, API requests, and database queries don’t block the event loop.

Efficient Resource Usage

Better CPU and memory utilization during I/O-bound operations.

Dual API

Both sync and async methods available for compatibility.

Sync vs Async Usage

from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

# Synchronous (blocks until complete)
response = rails.generate(
    messages=[{"role": "user", "content": "Hello"}]
)

# Asynchronous (non-blocking)
import asyncio

async def chat():
    response = await rails.generate_async(
        messages=[{"role": "user", "content": "Hello"}]
    )
    return response

response = asyncio.run(chat())
Always use async methods (generate_async) in async contexts to avoid blocking the event loop.

Custom Async Actions

Actions should be async for better performance:
from nemoguardrails.actions import action
import httpx

@action()
async def fetch_weather(city: str):
    """Fetch weather data asynchronously."""
    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://api.weather.com/{city}")
        return response.json()

Caching and Performance

NeMo Guardrails includes several caching mechanisms:

Model Output Caching

Cache LLM responses to avoid redundant calls:
# config.yml
models:
  - type: main
    engine: openai
    model: gpt-4o-mini
    
model_cache:
  enabled: true
  maxsize: 50000
  stats:
    enabled: true
    log_interval: 60  # Log cache stats every 60 seconds

Embeddings Caching

Vector embeddings are cached automatically for:
  • User message examples
  • Bot message examples
  • Flow definitions
  • Knowledge base chunks

History Cache

The events history for user message sequences is cached to maintain state across turns.

Extending the Architecture

You can extend NeMo Guardrails in several ways:
Add new Python functions:
# config/actions.py
from nemoguardrails.actions import action

@action()
async def my_custom_action(param: str):
    # Your logic here
    return result

Configuration Loading

The configuration loading process:
1

Load config.yml

Parse YAML configuration for models, rails, instructions
2

Load .co files

Parse all Colang files in the config directory
3

Load config.py

Execute custom initialization code (if present)
4

Load actions.py

Import and register custom actions (if present)
5

Load library flows

Import built-in guardrails from the library
6

Initialize runtime

Create the appropriate runtime (V1_0 or V2_x)

Next Steps

Build Your First Config

Create your first guardrails configuration

Custom Actions

Learn how to write custom Python actions

Advanced Flows

Master complex Colang flow patterns

Performance Tuning

Optimize your guardrails for production