GeraWitness vs. Scale Surge vs. In-house Trust & Safety: Adjacent Approaches Compared
Published 21 April 2026 · 9 min read
Scale Surge
Scale AI’s Surge offering provides credentialed human labelling and review for AI-training and content-moderation tasks. High quality, strong tooling, responsive ops.
Overlap: credentialed human review at scale.
Gap: built for data labelling, not real-time transaction blocking. No transaction-receipt integration. No accountability-per-signed-decision structure for disputed commerce actions. SLAs measured in hours, not seconds.
Relationship: complementary for training data; different product for synchronous action review.
In-house Trust & Safety (Meta, Google, etc.)
Platform T&S teams are the longest-running human- oversight functions in tech. Deep rubrics, strong legal context, tough working conditions.
Overlap: human review of platform actions.
Gap: platform-scoped, not protocol-scoped. A Meta reviewer cannot review a GeraClinic booking. Cross- platform transactions have no shared review surface. Worker conditions have historically been uneven.
Relationship: inspiration and cautionary tale. We learn from T&S practice and actively avoid its worst labour conditions.
Open-source moderation toolchains
Projects like Perspective API, Rekognition moderation, OpenSentinel. Strong on content classification; not designed for commerce-transaction review.
Relationship: candidate components of the pre-filter stage that routes actions to tiers.
Anthropic Constitutional AI and OpenAI Moderation
Covered in an earlier post. Short version: behaviour-shaping at inference, not action- blocking at commit. Layered, not replaced.
Where we are genuinely different
- Synchronous action blocking. The transaction holds until a signed decision returns; review is not post-hoc.
- Reviewer accountability. Decisions are signed with credential keys, linked to the reviewer, subject to senior-review audit.
- Cross-platform protocol. One review surface for actions against GeraClinic, GeraHome, any GeraNexus-compliant marketplace.
- Labour-conditions commitment. 4h max shifts, enforced breaks, content warnings, mental-health support, refuseable cases.
Where we might be wrong
- Reviewer capacity may not scale to demand during a spike.
- Accountability can pressure reviewers into over- refusing; calibration is ongoing.
- Credentialing reviewers is operationally expensive; unit economics at low-margin verticals may not work.
Cooperative future
The right 2030 stack: Scale Surge (or equivalent) for training-data labelling, Anthropic / OpenAI alignment for inference shaping, in-house T&S for platform-specific policy, GeraWitness for protocol-wide synchronous action review, GeraNexus for the transactional state that makes review tractable.
Help design agent safety that scales.
Join the waitlist