Skip to main content
Programmable Guardrails

What is NeMo Guardrails?

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational applications. Guardrails (or “rails” for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more. The toolkit enables developers building LLM-based applications to easily add programmable guardrails between the application code and the LLM, providing a critical control layer for production deployments.
NeMo Guardrails is developed by NVIDIA and is licensed under Apache 2.0. Learn more in the research paper published at EMNLP 2023.

Why NeMo Guardrails?

Building production-ready LLM applications requires more than just connecting to an API. You need control, safety, and reliability. NeMo Guardrails provides:

Safety & Trust

Define rails to guide and safeguard conversations. Prevent your LLM from engaging in unwanted topics or generating harmful content.

Controllable Dialog

Steer the LLM to follow pre-defined conversational paths, allowing you to design interactions following conversation design best practices.

Secure Tool Integration

Connect LLMs to other services (tools) seamlessly and securely with execution rails that validate inputs and outputs.

Multi-Layer Protection

Apply guardrails at five distinct points: input, dialog, retrieval, execution, and output for comprehensive control.

Key Features

Five Types of Guardrails

NeMo Guardrails supports five main types of guardrails that can be applied at different stages:
1

Input Rails

Applied to user input before processing. Can reject or alter input (e.g., mask sensitive data, rephrase).
2

Dialog Rails

Influence how the LLM is prompted and control conversational flow. Determine if actions should execute or predefined responses should be used.
3

Retrieval Rails

Applied to retrieved chunks in RAG scenarios. Can reject or modify chunks before they’re used to prompt the LLM.
4

Execution Rails

Applied to input/output of custom actions (tools) that the LLM needs to call.
5

Output Rails

Applied to LLM-generated output before returning to the user. Can reject or alter output (e.g., remove sensitive data).

Built-in Guardrails Library

NeMo Guardrails comes with a comprehensive library of pre-built guardrails:
  • LLM Self-Checking: Input/output moderation, fact-checking, hallucination detection
  • NVIDIA Safety Models: Content safety, topic safety
  • Jailbreak & Injection Detection: Protect against prompt injection attacks
  • Third-Party Integrations: ActiveFence, AlignScore, and more
Built-in guardrails may not be suitable for all production use cases. Work with your team to ensure guardrails meet requirements for your industry and use case.

Colang Modeling Language

NeMo Guardrails introduces Colang, a modeling language specifically created for designing flexible, yet controllable, dialogue flows. Colang has a Python-like syntax and makes it easy to:
  • Define conversational patterns and flows
  • Specify allowed and disallowed topics
  • Create custom input/output validation rules
  • Implement complex dialog control logic
Two versions of Colang are supported: 1.0 (default) and 2.0. Both are fully supported with extensive examples.

Use Cases

You can use programmable guardrails in different types of applications:
Enforce fact-checking and output moderation over a set of documents using Retrieval Augmented Generation.
Ensure chatbots stay on topic and follow designed conversational flows for support, sales, or specialized domains.
Add guardrails to your custom LLM deployments for safer customer interaction.
Wrap guardrails around any LangChain chain or Runnable for enhanced safety and control.

LLM Vulnerability Protection

NeMo Guardrails provides robust protection against common LLM vulnerabilities:
LLM Vulnerability Scan Results
The toolkit includes mechanisms for protecting against:
  • Jailbreak attempts
  • Prompt injection attacks
  • Sensitive data leakage
  • Off-topic conversations
  • Harmful content generation

How It Works

The basic flow is simple:
  1. Load a guardrails configuration from YAML and Colang files
  2. Create an LLMRails instance with your configuration
  3. Call the LLM through the guardrails layer using generate() or generate_async()
from nemoguardrails import LLMRails, RailsConfig

# Load configuration
config = RailsConfig.from_path("PATH/TO/CONFIG")
rails = LLMRails(config)

# Generate with guardrails
completion = rails.generate(
    messages=[{"role": "user", "content": "Hello world!"}]
)
The input and output format is compatible with OpenAI’s Chat Completions API, making integration straightforward.

What Makes NeMo Guardrails Different?

While there are many approaches to adding guardrails to LLMs, NeMo Guardrails stands out by:
  • Providing a unified toolkit that integrates multiple complementary approaches (moderation endpoints, critique chains, parsing, individual guardrails)
  • Offering dialog modeling capabilities that enable both precise dialog control and fine-grained guardrail application
  • Supporting multiple LLMs including OpenAI GPT-3.5/4, LLaMa-2, Falcon, Vicuna, Mosaic, and more
  • Being async-first with full support for both sync and async APIs
  • Integrating seamlessly with LangChain and other popular frameworks

Next Steps

Installation

Get NeMo Guardrails installed and ready to use

Quickstart

Build your first guardrails-protected application

GitHub Repository

View the source code and contribute

Official Documentation

Read the comprehensive documentation