AI & ML
impact 16
KellyBench: A Benchmark for Long-Horizon Sequential Decision Making
KellyBench: A Benchmark for Long-Horizon Sequential Decision Making arXiv:2604.27865v1 Announce Type: new Abstract: Language models are saturating benchmarks for procedural tasks with narrow objectives. But they are inc…
Why it matters
This signals a broader shift in kellybench. The real question is whether benchmark moves the needle for practitioners.