← Back to Blog
Technical

Protocol Deep-Dive: Risk Tiers, Review SLA, Reviewer Accountability

Published 21 April 2026 · 12 min read

Coming soon — join the waitlist

Quick answer. Four risk tiers (automatic, sampled, mandatory, hard-stop) with SLAs of none / 24h / 30s / N/A. Reviewers are credentialed, trained against a rubric, dual-reviewed on a sample, and personally accountable for decisions they sign. Every review is a signed decision that enters the GeraNexus receipt chain as evidence.

Why tiered

If every agent action went to a human, the economics collapse. If none did, the worst-case mistakes compound. Tiering lets the system spend review cost on the actions that most benefit from review, while keeping the common case cheap.

The four tiers

Tier 0 — automatic (no review)

Read-only actions, low-value transactions under the user’s configured threshold, actions with prior signed consent for identical-shape transactions. SLA: none. Throughput: unlimited.

Tier 1 — sampled review (asynchronous)

Mid-risk transactions: ~5% sampled uniformly, ~15% sampled adversarially (novel pattern, first-time marketplace, edge- case flag). Reviews happen within 24 hours and annotate the audit log. SLA: review happens after the action, for learning and pattern detection, not to block it.

Tier 2 — mandatory review (synchronous)

High-value transactions (above configurable thresholds), first-time marketplace-user pairs for sensitive verticals, anything involving minors as transaction context, medical bookings with safeguarding flags, cross-border payments above £500. SLA: median review in under 30 seconds, p95 under 2 minutes, 24/7 staffing. The agent holds the commit until a signed reviewer decision returns.

Tier 3 — hard stop

Actions the protocol refuses outright: known-fraud marketplace patterns, actions with a signed distress flag from the user’s vault, legal-prohibited flows. No review — refused with a reason logged and appeal path offered.

How an action is classified

A rule engine ingests the agent’s proposed action, the marketplace descriptor, the consent token, and the user’s preference profile. It returns a tier and a reason trace. The rule engine is open and versioned; changes go through a public review process.

Reviewer training

Reviewers are credentialed (background-checked, trained for 80 hours on the rubric + case library + ethics). Every reviewer has a specialty — health, finance, family / safeguarding, cross-border — and is only routed cases in their specialty.

Every reviewer’s first 200 decisions are dual-reviewed by a senior reviewer. After certification, a 5% ongoing sample is dual-reviewed to detect drift. Reviewers who disagree with senior review consistently are retrained or rotated.

The review UI

Reviewers see the agent’s proposed action, the relevant slices of the user’s preference profile (via a minimised-read from GeraMind), the marketplace reputation record, and a rubric-scored risk summary. They choose approve / reject / escalate / clarify. Every decision is signed with the reviewer’s credential key.

Reviewer accountability

Signed decisions are linkable to the reviewer identity (not shown to the user by default but available to the audit panel). Reviewers personally carry some accountability for their decisions — clearly-negligent approvals can lead to retraining, suspension, or removal from the panel. This is controversial; it is also the thing that keeps the review from degrading into rubber-stamping.

Integration with GeraNexus receipts

Every review decision becomes a review_receipt appended to the GeraNexus receipt chain. Disputes arrive at arbitration with the complete chain including the review decision. If the reviewer was wrong, that is evidence; if right, that is evidence.

What we refuse

  • Sampling on Tier 2: if the action is mandatory-review, it is not sampled. Every single case is reviewed.
  • Review-as-a-nudge: we do not let marketplaces buy a “preferred review lane” that reduces friction. Review serves the user, not the marketplace.
  • AI-only review of Tier 2 actions. LLMs assist reviewers; they do not replace them at this tier.

Open design questions

  • Cultural calibration. A reviewer in one country may apply a different risk threshold than another. We rotate cases across regional pools to detect this.
  • Reviewer burnout. Mandatory 4h shifts maximum, enforced breaks, mental-health support, and a content-warning system for distressing cases.
  • Costs. Tier 2 SLAs are expensive. We subsidise strategically for sensitive verticals (health, safeguarding) and fund the bulk from Gera-platform margin.

Help design agent safety that scales.

Join the waitlist