← Back to Blog
Technical

A Human-in-Loop Escalation Design for Agent Safety

Published 21 April 2026 · 10 min read

Coming soon — join the waitlist

Quick answer. Actions arrive tagged by risk (T0-T3). T0 auto-commits. T1 auto-commits with sampled review. T2 gates on human approval. T3 is categorically refused. The reviewer sees the proposed action, consent chain, agent reasoning, and category policy. Approval or rejection is appended to the signed receipt trail. Review latency is SLA-bound.

Actions arrive tagged

Every GeraNexus-emitted action carries a risk tag. The tag is produced by a risk classifier that considers: transaction value, category (medical / legal / financial / standard), reversibility, user history, agent history, anomaly score, and category- specific rules. The classifier is tuned per-category; the same transaction value triggers a different tier for a health booking than for a restaurant reservation.

The four tiers

T0 — auto. Low-value, low-risk, routine. No human touch. ~90%+ of transactions by volume. Example: a £12 dinner delivery.

T1 — sampled. Auto-commits but a random percentage (1-5% depending on category) is reviewed after the fact for calibration. Reviewer signal feeds back into the risk classifier. Does not block commits. Example: a £95 home-service booking.

T2 — mandatory pre-commit review. Blocks on human approval. Expected volume single-digit percent. Reviewer sees the action, context, consent chain, agent reasoning, category policy. Approve, reject, or request-more-context. Example: a £2,400 medical procedure booking, a legal commitment, an irreversible payment.

T3 — hard stop. Categorically refused. Model cannot override; agent developer cannot override without custom contractual override. Example: weapons purchase, identity-document-forgery request, minors in certain categories.

What the reviewer sees

Proposed action: shape, value, recipient.
Consent chain: the token the agent is presenting, the original user consent, the purpose binding.
Agent reasoning: where available and consented, a summary of how the agent reached this action (not the raw chain-of-thought).
Category policy: the rules governing this category.
User context: only the minimum relevant fields pulled from GeraMind under consent.

What the reviewer does not see

  • Raw model weights or prompts.
  • The full user profile — only category-relevant fields.
  • Other concurrent transactions by the same user (reduces conflict-of-interest risk).
  • Identifying information on the counter-party beyond what is necessary.

SLA

T2 review target: 90% completed within 60 seconds, 99% within 180 seconds. Agents receive a provisional hold and show the user a "being reviewed" state. If the SLA is missed, the action is cancelled and the user notified.

Reviewer workflow

Queue feeds reviewers by category fit and fluency. Each reviewer has specialisations: health, legal, payments, general commerce. Reviews below a confidence threshold are escalated to a senior reviewer. A random sample of any reviewer’s decisions is re-reviewed by a different reviewer for calibration.

The signed receipt

The approve / reject decision is appended to the GeraNexus signed receipt. Downstream audit shows that a specific reviewer (by pseudonymised ID) approved the commit. In the event of a dispute, the reviewer ID is retrievable under legal process.

What we deliberately don’t do

  • We don’t train LLMs on reviewer decisions. The labels are operational, not training data, and mixing them creates bad incentives.
  • We don’t expose individual reviewer identity to users or counter-parties. Reviewer safety matters.
  • We don’t auto-retry a rejection. A rejected action is done. A new attempt requires a new user-initiated flow.

How to plug in

GeraWitness is a service GeraNexus calls pre-commit. Third-party agent-commerce platforms can call the same endpoint. The API is straightforward: POST the action, receive a review token, poll (or webhook) for the decision.

Related reading

GeraNexus emits the action. GeraMindprovides the minimal consent-scoped context the reviewer sees. Pilot design drafts at /research.

Help design agent safety that scales.

Join the waitlist