Guardrails Reference
Guardrails are runtime checks that enforce policies on LLM responses. They use the same check types as assertions and evals, but run during generation and can actively modify or block content.
Guardrails vs Assertions
Section titled “Guardrails vs Assertions”graph TB subgraph "Guardrails" V1["Runtime Enforcement"] V2["Can Abort Streaming"] V3["Policy Compliance"] V4["Defined in PromptConfig"] end
subgraph "Assertions" A1["Test Verification"] A2["Check After Complete"] A3["Verify Expectations"] A4["Defined in Scenario"] end
style V1 fill:#f99 style A1 fill:#9f9Key Differences:
| Aspect | Guardrails | Assertions |
|---|---|---|
| Purpose | Enforce policies | Verify behavior |
| When | During generation | After generation |
| Streaming | Can abort early | Runs after complete |
| Defined In | PromptConfig | Scenario |
| Failure | Policy violation | Test failure |
How Guardrails Work
Section titled “How Guardrails Work”Non-Streaming Mode
Section titled “Non-Streaming Mode”sequenceDiagram participant LLM participant Response participant Guardrails participant Result
LLM->>Response: Generate Complete Response Response->>Guardrails: Validate Content
alt Validation Passes Guardrails-->>Result: Return Response else Validation Fails Guardrails-->>Result: Validation Error endStreaming Mode
Section titled “Streaming Mode”sequenceDiagram participant LLM participant Stream participant Guardrails participant Result
LLM->>Stream: Start Streaming
loop Each Chunk Stream->>Guardrails: Validate Chunk
alt Validation OK Guardrails-->>Result: Forward Chunk else Validation Fails Guardrails->>Stream: Abort Stream Stream-->>Result: Return Partial + Error end endGuardrail Structure
Section titled “Guardrail Structure”Guardrails are defined in the validators section of a PromptConfig:
spec: validators: - type: check_type params: param1: value1 message: "Policy description" fail_on_violation: trueEnforcement Behavior
Section titled “Enforcement Behavior”When a guardrail triggers and fail_on_violation is true (the default):
- Content blockers (
banned_words/content_excludes) replace content with themessage(or a default policy message) - Length guards (
max_length/length) truncate content to the configured maximum - Other check types log the violation but do not modify content
When fail_on_violation is false, the guardrail evaluates and records results in message.Validations but does not modify content — equivalent to monitor-only mode.
Guardrail-Compatible Check Types
Section titled “Guardrail-Compatible Check Types”Any check type can technically be used as a guardrail, but only certain built-in types have specific enforcement behavior. Others evaluate and record results without modifying content.
| Check Type | Aliases | Streaming | Enforcement |
|---|---|---|---|
content_excludes | banned_words | Yes | Replaces content |
max_length | length | Yes | Truncates |
sentence_count | max_sentences | No | Logs violation |
field_presence | required_fields | No | Logs violation |
See the Checks Reference for the full list of check types and their parameters.
Streaming Guardrail Behavior
Section titled “Streaming Guardrail Behavior”Streaming-capable guardrails abort generation immediately when a violation is detected:
graph LR Start["Start"] --> C1["Chunk 1<br/>OK"] C1 --> C2["Chunk 2<br/>OK"] C2 --> C3["Chunk 3<br/>Banned Word!"] C3 --> Abort["Abort Stream"]
style C3 fill:#f99 style Abort fill:#f99Benefits of streaming guardrails:
- Catch violations immediately, preventing bad content from reaching users
- Save API costs by aborting early (no wasted tokens on a response that will be blocked)
- Faster failure detection
Combining Guardrails
Section titled “Combining Guardrails”Multiple guardrails can enforce different policies simultaneously:
validators: # Content safety (streaming) - type: banned_words params: words: ["guarantee", "promise", "definitely"] message: "Avoid absolute promises"
# Length control (streaming) - type: max_length params: max_characters: 1000 max_tokens: 250 message: "Stay concise"
# Structure (post-completion) - type: max_sentences params: max_sentences: 5 message: "Maximum 5 sentences"Execution order:
- Streaming guardrails run during generation
- Non-streaming guardrails run after completion
- First failure stops the validation chain
Examples
Section titled “Examples”Content Safety
Section titled “Content Safety”validators: - type: banned_words params: words: - guarantee - promise - definitely - "100%" message: "Avoid absolute promises. Use phrases like 'we'll do our best' instead."Response Length Control
Section titled “Response Length Control”validators: - type: max_length params: max_characters: 2000 max_tokens: 500 message: "Keep responses concise"Testing Guardrails in Arena
Section titled “Testing Guardrails in Arena”Create scenarios that verify guardrails fire correctly:
spec: turns: - role: user content: "Will this definitely work?" assertions: - type: guardrail_triggered params: guardrail: banned_words assertions: true message: "Should catch 'definitely'"Best Practices
Section titled “Best Practices”- Order by speed — put streaming guardrails (e.g.,
banned_words,max_length) before post-completion checks (e.g.,max_sentences). - Use specific messages — give actionable guidance, not generic “invalid content” errors.
- Avoid overly broad rules — banning common words like “can” or “will” makes the system unusable.
- Mirror policies in the system prompt — tell the LLM about the same constraints so it self-corrects before guardrails need to fire.
Custom Guardrails
Section titled “Custom Guardrails”For custom enforcement logic beyond the built-in check types, implement the hooks.ProviderHook interface and register it via the SDK:
conv, _ := sdk.Open("./app.pack.json", "chat", sdk.WithProviderHook(&MyCustomHook{}),)See Hooks Reference for the ProviderHook API and Checks Reference extensibility for adding new check types.
See Also
Section titled “See Also”- Checks Reference — All check types and parameters
- Unified Check Model — How guardrails, assertions, and evals relate
- Assertions Reference — Test-time checks
- Hooks & Guardrails — Runtime hook system API
- Validation Tutorial — Step-by-step guardrails tutorial