Policy
impact 16
Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex
Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex arXiv:2604.14858v2 Announce Type: replace Abstract: As agent systems move into increasingly diverse executi…
Why it matters
A useful signal for anyone monitoring trajectory. The benchmarks factor makes this more consequential than it first appears.