Chat Completions API - NeMo Guardrails

POST /v1/chat/completions

Create a chat completion with guardrails applied. This endpoint is compatible with the OpenAI Chat Completions API with additional guardrails-specific extensions.

Request

model

string

required

The LLM model to use for chat completion (e.g., “gpt-4o”, “llama-3.1-8b”).

messages

array

The list of messages in the current conversation.

[
  {"role": "user", "content": "Hello!"},
  {"role": "assistant", "content": "Hi there!"}
]

stream

boolean

default:"false"

If set, partial message deltas will be sent as server-sent events.

max_tokens

integer

The maximum number of tokens to generate.

temperature

number

Sampling temperature to use (0.0 to 2.0).

top_p

number

Top-p sampling parameter (0.0 to 1.0).

stop

string | array

Stop sequences where the API will stop generating further tokens.

presence_penalty

number

Presence penalty parameter (-2.0 to 2.0).

frequency_penalty

number

Frequency penalty parameter (-2.0 to 2.0).

Guardrails Extensions

guardrails

object

Guardrails-specific options:

guardrails.config_id

string

The guardrails configuration ID to use.

guardrails.config_ids

array

List of configuration IDs to combine. Cannot be used with config_id.

guardrails.thread_id

string

The ID of an existing thread to continue (minimum 16 characters).

guardrails.context

object

Additional context data for the conversation.

{
  "user_name": "Alice",
  "user_id": "12345"
}

guardrails.options

GenerationOptions

Additional generation options:

rails: Which rails to enable ({"input": true, "output": true})
log: Logging options ({"activated_rails": true, "llm_calls": true})
output_vars: Variables to extract from context

guardrails.state

object

State object to continue the interaction. Must contain events or state key.

Response

string

Unique identifier for the chat completion.

object

string

Always “chat.completion”.

created

integer

Unix timestamp of when the completion was created.

model

string

The model used for the completion.

choices

array

Array of completion choices.

choices[].index

integer

The index of this choice.

choices[].message

object

The generated message.

choices[].message.role

string

Always “assistant”.

choices[].message.content

string

The content of the message.

choices[].message.tool_calls

array

Tool calls generated by the model (if any).

choices[].finish_reason

string

The reason the generation stopped: “stop”, “length”, or “content_filter”.

guardrails

object

Guardrails-specific output data:

guardrails.config_id

string

The configuration ID that was used.

guardrails.state

object

Updated state object for continuing the conversation.

guardrails.log

object

Generation log data (if requested):

activated_rails: List of rails that were activated
llm_calls: Details of LLM calls made
stats: Performance statistics

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "guardrails": {
      "config_id": "my-config"
    }
  }'

Response Examples

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "guardrails": {
    "config_id": "my-config",
    "state": {
      "events": [...]
    },
    "log": {
      "stats": {
        "total_llm_calls": 2,
        "total_time": 1.5
      }
    }
  }
}

Error Responses

error

object

error.message

string

Human-readable error message.

error.type

string

Error type: “invalid_request_error”, “authentication_error”, “server_error”, etc.

error.code

string

Error code.

{
  "error": {
    "message": "No guardrails config_id provided and server has no default configuration",
    "type": "invalid_request_error",
    "code": "missing_config"
  }
}

Documentation Index

​POST /v1/chat/completions

​Request

​Guardrails Extensions

​Response

​Response Examples

​Error Responses

POST /v1/chat/completions

Request

Guardrails Extensions

Response

Response Examples

Error Responses