A Human-in-Loop Escalation Design for Agent Safety
Published 21 April 2026 · 10 min read
Actions arrive tagged
Every GeraNexus-emitted action carries a risk tag. The tag is produced by a risk classifier that considers: transaction value, category (medical / legal / financial / standard), reversibility, user history, agent history, anomaly score, and category- specific rules. The classifier is tuned per-category; the same transaction value triggers a different tier for a health booking than for a restaurant reservation.
The four tiers
T0 — auto. Low-value, low-risk, routine. No human touch. ~90%+ of transactions by volume. Example: a £12 dinner delivery.
T1 — sampled. Auto-commits but a random percentage (1-5% depending on category) is reviewed after the fact for calibration. Reviewer signal feeds back into the risk classifier. Does not block commits. Example: a £95 home-service booking.
T2 — mandatory pre-commit review. Blocks on human approval. Expected volume single-digit percent. Reviewer sees the action, context, consent chain, agent reasoning, category policy. Approve, reject, or request-more-context. Example: a £2,400 medical procedure booking, a legal commitment, an irreversible payment.
T3 — hard stop. Categorically refused. Model cannot override; agent developer cannot override without custom contractual override. Example: weapons purchase, identity-document-forgery request, minors in certain categories.
What the reviewer sees
Proposed action: shape, value, recipient.
Consent chain: the token the agent is presenting, the original user consent, the purpose binding.
Agent reasoning: where available and consented, a summary of how the agent reached this action (not the raw chain-of-thought).
Category policy: the rules governing this category.
User context: only the minimum relevant fields pulled from GeraMind under consent.
What the reviewer does not see
- Raw model weights or prompts.
- The full user profile — only category-relevant fields.
- Other concurrent transactions by the same user (reduces conflict-of-interest risk).
- Identifying information on the counter-party beyond what is necessary.
SLA
T2 review target: 90% completed within 60 seconds, 99% within 180 seconds. Agents receive a provisional hold and show the user a "being reviewed" state. If the SLA is missed, the action is cancelled and the user notified.
Reviewer workflow
Queue feeds reviewers by category fit and fluency. Each reviewer has specialisations: health, legal, payments, general commerce. Reviews below a confidence threshold are escalated to a senior reviewer. A random sample of any reviewer’s decisions is re-reviewed by a different reviewer for calibration.
The signed receipt
The approve / reject decision is appended to the GeraNexus signed receipt. Downstream audit shows that a specific reviewer (by pseudonymised ID) approved the commit. In the event of a dispute, the reviewer ID is retrievable under legal process.
What we deliberately don’t do
- We don’t train LLMs on reviewer decisions. The labels are operational, not training data, and mixing them creates bad incentives.
- We don’t expose individual reviewer identity to users or counter-parties. Reviewer safety matters.
- We don’t auto-retry a rejection. A rejected action is done. A new attempt requires a new user-initiated flow.
How to plug in
GeraWitness is a service GeraNexus calls pre-commit. Third-party agent-commerce platforms can call the same endpoint. The API is straightforward: POST the action, receive a review token, poll (or webhook) for the decision.
Related reading
GeraNexus emits the action. GeraMindprovides the minimal consent-scoped context the reviewer sees. Pilot design drafts at /research.
Help design agent safety that scales.
Join the waitlist