Build Guide / Step 05
Deep Dive · Step 05 of 06SET THE
GUARDRAILS.
An agent without guardrails takes the most efficient path — which isn’t always the one you’d choose. This step builds the rules that decide when an agent acts, when it stops, and when it hands a problem to a human. Skip this and your agents fail silently. That’s worse than no agent.
The Three Layers
Where Guardrails Live
The Escalation Block
Add this to your lead agent’s Instructions, below the routing rules. It defines exactly what triggers a stop-and-escalate. Adapt the bracketed thresholds to your team.
ESCALATION RULES (these override everything else): Stop and respond with "ESCALATE: [reason]" — taking no other action — if ANY of the following are true: - The request involves money over $[AMOUNT]. - It touches legal, medical, HR, or compliance matters. - A worker returns an error or an incomplete result twice. - You are less than [80]% confident in the right route. - The person seems upset, or the situation is escalating. - The request asks you to act outside your listed workers. WHEN YOU ESCALATE: - Name the reason in one plain sentence. - Summarize what you understood, so the human has context. - Do not apologize repeatedly or stall. Hand off cleanly.
A Condition That Stops the Line
Instructions guide behavior, but a flow condition enforces it. In your handoff flow, add a check on the worker’s output before it goes anywhere.
A good agent system is loud when it’s unsure. Every guardrail you write is really answering one question: “What happens when this goes wrong?” If you can answer that for each agent, you’ve built something trustworthy. If you can’t, you’ve built a liability with good intentions.
Try to Break It
In the Test panel, feed the lead something it shouldn’t handle — a legal question, an angry message, a request outside its workers. Confirm it escalates instead of improvising. An agent you’ve watched refuse correctly is one you can actually trust to run.
Leave a Reply