Skip to main content
NeMo Guardrails supports multiple LLM providers including OpenAI, NVIDIA NIM, Google Vertex AI, and HuggingFace. Each provider requires specific configuration in your config.yml file.

Supported LLM Providers

OpenAI

GPT-3.5, GPT-4, and other OpenAI models

NVIDIA NIM

LLama, Nemotron, and NVIDIA optimized models

Vertex AI

Google’s Gemini and PaLM models

HuggingFace

Open-source models via pipeline or endpoints

OpenAI Configuration

Basic OpenAI Setup

From examples/bots/hello_world/config.yml:
models:
  - type: main
    engine: openai
    model: gpt-4o-mini

OpenAI with Parameters

From examples/configs/sample/config.yml:
models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct
    parameters:
      temperature: 0.7
      max_tokens: 256
      top_p: 1.0
      frequency_penalty: 0.0
      presence_penalty: 0.0

Supported OpenAI Models

model
string
  • gpt-4o - Latest GPT-4 Omni model
  • gpt-4o-mini - Smaller, faster GPT-4 Omni
  • gpt-4-turbo - GPT-4 Turbo
  • gpt-4 - GPT-4 base model
  • gpt-3.5-turbo - GPT-3.5 Turbo (chat)
  • gpt-3.5-turbo-instruct - GPT-3.5 Instruct (completion)

Environment Variables

Set your OpenAI API key:
export OPENAI_API_KEY="sk-..."

NVIDIA NIM Configuration

Basic NIM Setup

From examples/configs/llm/nim/config.yml:
models:
  - type: main
    engine: nim
    model: meta/llama3-8b-instruct
    parameters:
      base_url: http://localhost:7331/v1

NIM with Cloud API

models:
  - type: main
    engine: nim
    model: meta/llama-3.3-70b-instruct
    parameters:
      base_url: https://integrate.api.nvidia.com/v1
      api_key: ${NVIDIA_API_KEY}

Multiple NIM Models

From examples/configs/content_safety/config.yml:
models:
  - type: main
    engine: nim
    model: meta/llama-3.3-70b-instruct

  - type: content_safety
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety
models:
  - type: main
    engine: nim
    model: meta/llama3-8b-instruct
    # or
    # model: meta/llama-3.3-70b-instruct
    # model: meta/llama-3.1-405b-instruct

Environment Variables

For NVIDIA NIM cloud:
export NVIDIA_API_KEY="nvapi-..."

Google Vertex AI Configuration

From examples/configs/llm/vertexai/config.yml:
models:
  - type: main
    engine: vertexai
    model: gemini-1.0-pro

Vertex AI with Parameters

models:
  - type: main
    engine: vertexai
    model: gemini-1.5-pro
    parameters:
      temperature: 0.7
      max_output_tokens: 1024
      top_p: 0.95
      top_k: 40

Supported Vertex AI Models

model
string
  • gemini-1.5-pro - Latest Gemini Pro
  • gemini-1.0-pro - Gemini Pro 1.0
  • gemini-1.5-flash - Fast Gemini model

Authentication

Vertex AI requires Google Cloud authentication:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
export GOOGLE_CLOUD_PROJECT="your-project-id"

HuggingFace Configuration

HuggingFace Pipeline

From examples/configs/llm/hf_pipeline_llama2/config.yml:
models:
  - type: main
    engine: hf_pipeline_llama2
    model: meta-llama/Llama-2-7b-chat-hf
    parameters:
      device_map: auto
      torch_dtype: float16

Other HuggingFace Pipeline Examples

From examples/configs/llm/hf_pipeline_vicuna/config.yml:
models:
  - type: main
    engine: hf_pipeline_vicuna
    model: lmsys/vicuna-7b-v1.5

HuggingFace Endpoint

From examples/configs/llm/hf_endpoint/config.yml:
models:
  - type: main
    engine: hf_endpoint
    model: your-model-endpoint
    parameters:
      endpoint_url: https://your-endpoint.huggingface.cloud

Environment Variables

export HUGGINGFACEHUB_API_TOKEN="hf_..."

Model Parameters

Common parameters across providers:
temperature
float
default:"0.7"
Controls randomness. Lower values make output more deterministic
max_tokens
integer
Maximum number of tokens to generate
top_p
float
default:"1.0"
Nucleus sampling parameter
frequency_penalty
float
default:"0.0"
Penalizes repeated tokens (OpenAI)
presence_penalty
float
default:"0.0"
Penalizes tokens based on presence (OpenAI)
base_url
string
Custom API endpoint URL (NIM, custom deployments)
api_key
string
API key for authentication. Use environment variables for security

Multiple Model Types

You can configure different models for different purposes:
models:
  # Main conversation model
  - type: main
    engine: openai
    model: gpt-4o

  # Content safety checking
  - type: content_safety
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

  # Jailbreak detection
  - type: jailbreak_detection
    engine: nim
    model: nvidia/nemoguard-jailbreak-detection

  # Embedding model for retrieval
  - type: embeddings
    engine: openai
    model: text-embedding-ada-002

Streaming Support

Enable streaming for real-time responses. From examples/configs/streaming/config.yml:
models:
  - type: main
    engine: openai
    model: gpt-4

rails:
  dialog:
    single_call:
      enabled: True
Use streaming in Python:
from nemoguardrails import RailsConfig, LLMRails

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

# Streaming response
async for chunk in rails.stream_async(
    messages=[{"role": "user", "content": "Tell me a story"}]
):
    print(chunk, end="", flush=True)

Custom LLM Providers

You can add custom LLM providers by implementing the LLM interface:
# config/config.py
from nemoguardrails.llm.providers import register_llm_provider
from nemoguardrails.llm.base import BaseLLM

class CustomLLM(BaseLLM):
    """Custom LLM implementation."""
    
    async def generate(self, prompt: str, **kwargs):
        # Your custom generation logic
        pass

# Register the provider
register_llm_provider("custom_engine", CustomLLM)
Then use in config.yml:
models:
  - type: main
    engine: custom_engine
    model: your-custom-model

Best Practices

1

Use Environment Variables

Never hardcode API keys in config files
models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
      api_key: ${OPENAI_API_KEY}  # Good
      # api_key: sk-hardcoded-key  # Bad!
2

Choose Appropriate Models

  • Use smaller models (e.g., gpt-4o-mini) for simple tasks
  • Use larger models (e.g., gpt-4o) for complex reasoning
  • Use specialized models for specific tasks (safety, embeddings)
3

Configure Timeouts

models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
      timeout: 30  # seconds
4

Test Locally First

Use local NIM deployments for development before switching to cloud APIs

Testing LLM Configuration

nemoguardrails chat --config ./config

Troubleshooting

Ensure environment variables are set:
echo $OPENAI_API_KEY
echo $NVIDIA_API_KEY
If empty, export them before running:
export OPENAI_API_KEY="your-key-here"
For NIM local deployments, verify the server is running:
curl http://localhost:7331/v1/models
Verify the model name matches the provider’s available models:

Next Steps

config.yml Schema

Learn about all configuration options

Guardrails Library

Explore built-in guardrails for different LLMs