Guardrails
Guardrails are composable validation layers that intercept content before it reaches an agent (input guards) and before it returns to the caller (output guards). They allow you to enforce policies, redact sensitive data, and abort runs when content violates your rules.
How Guards Work
A guard is a stateless class that receives content and returns one of three outcomes:
- Pass (return
nilorGuardResult.pass): Content is acceptable, continue execution. - Rewrite (
GuardResult.rewrite): Replace the content with a modified version. - Tripwire (
GuardResult.tripwire): Abort the run immediately with an error.
class PiiRedactor < Agents::Guard
guard_name "pii_redactor"
description "Redacts Social Security numbers from content"
def call(content, context)
redacted = content.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[REDACTED]")
GuardResult.rewrite(redacted, message: "SSN redacted") if redacted != content
end
end
Input Guards vs Output Guards
Input guards run before the first LLM call. They validate or transform the user’s message before the agent sees it. Use them for prompt injection detection, input sanitization, or content filtering.
Output guards run on the agent’s final response before it returns to the caller. They validate or transform what the agent says back. Use them for PII redaction, topic fencing, or response quality checks.
agent = Agents::Agent.new(
name: "Support",
instructions: "You are a helpful support agent.",
input_guards: [PromptInjectionGuard.new],
output_guards: [PiiRedactor.new, TopicFence.new]
)
Guards execute in array order. Each guard sees the output of the previous guard’s potential rewrite, forming a processing pipeline.
Writing a Guard
Extend Agents::Guard and implement the call method:
class MaxLengthGuard < Agents::Guard
guard_name "max_length"
description "Tripwires if content exceeds maximum length"
def initialize(max:)
super()
@max = max
end
def call(content, context)
if content.length > @max
GuardResult.tripwire(
message: "Content exceeds #{@max} characters",
metadata: { length: content.length, max: @max }
)
end
end
end
Guards follow the same thread-safety principles as Tools:
- No execution state in instance variables (only configuration like
@maxabove) - All shared state flows through the
contextparameter - Guard instances are immutable after creation
Tripwires
When a guard tripwires, the run aborts immediately. The result includes structured metadata about what happened:
result = runner.run("Tell me a secret")
if result.tripwired?
puts result.guardrail_tripwire[:guard_name] # => "content_policy"
puts result.guardrail_tripwire[:message] # => "Response violates content policy"
puts result.guardrail_tripwire[:metadata] # => { category: "secrets" }
end
Tripwires short-circuit the guard chain. If guard 1 tripwires, guards 2 and 3 never run.
Fail-Open vs Fail-Closed
By default, guards are fail-open: if a guard raises an unexpected exception (not a Tripwire), the error is logged and the guard is skipped. This prevents a buggy guard from breaking your entire application.
For high-security contexts, you can configure fail-closed (strict) mode on the agent. In strict mode, any unexpected guard exception is converted to a tripwire:
# Fail-open (default) — buggy guard is skipped, run continues
agent = Agents::Agent.new(
name: "Support",
input_guards: [PotentiallyBuggyGuard.new]
)
# Fail-closed — any guard error aborts the run
# (configured via GuardRunner strict: true, typically set at the runner level)
Structured Output
When an agent uses response_schema, the LLM returns structured data (a Hash). Output guards still receive a String — the SDK automatically serializes the Hash to JSON before the guard chain and deserializes it back after any rewrite. This means your guards always operate on Strings regardless of output format.
# This guard works on both plain text and structured output
class ContentFilter < Agents::Guard
guard_name "content_filter"
def call(content, context)
# content is always a String — JSON for structured output
if content.include?("forbidden")
GuardResult.tripwire(message: "Forbidden content detected")
end
end
end
Guards Across Handoffs
Guards are agent-scoped. When agent A hands off to agent B:
- Agent A’s input guards ran once on the original user input (before the handoff decision).
- Agent A’s output guards do NOT run — the handoff interrupts before a final response.
- Agent B’s output guards run on agent B’s final response.
This means each agent enforces its own policies independently.
Callbacks and Instrumentation
Guard activity is observable through the callback system:
runner = Agents::Runner.with_agents(agent)
.on_guard_triggered { |guard_name, phase, action, message, ctx|
puts "Guard #{guard_name} (#{phase}): #{action} — #{message}"
}
The callback fires for every non-pass result (rewrites and tripwires). It does not fire when guards pass.
If OpenTelemetry instrumentation is installed, guard events produce agents.run.guard.* spans with attributes for guard name, phase (input/output), action (rewrite/tripwire), and message.
Complete Example
class PromptInjectionGuard < Agents::Guard
guard_name "prompt_injection"
description "Detects common prompt injection patterns"
def call(content, context)
patterns = [
/ignore\s+(all\s+)?previous\s+instructions/i,
/you\s+are\s+now\s+a/i,
/disregard\s+(all\s+)?prior/i
]
if patterns.any? { |p| content.match?(p) }
GuardResult.tripwire(
message: "Potential prompt injection detected",
metadata: { input_length: content.length }
)
end
end
end
class PiiRedactor < Agents::Guard
guard_name "pii_redactor"
description "Redacts SSNs and email addresses"
def call(content, context)
redacted = content
.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[SSN REDACTED]")
.gsub(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/, "[EMAIL REDACTED]")
GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content
end
end
agent = Agents::Agent.new(
name: "Support",
instructions: "You are a helpful customer support agent.",
input_guards: [PromptInjectionGuard.new],
output_guards: [PiiRedactor.new]
)
runner = Agents::Runner.with_agents(agent)
.on_guard_triggered { |name, phase, action, msg|
Rails.logger.info("Guard #{name} (#{phase}): #{action}")
}
result = runner.run("What is my email?")
# Output PII is automatically redacted before reaching the user