Engineering
impact 12
SWE-bench Verified no longer measures frontier coding capabilities
SWE-bench Verified no longer measures frontier coding capabilities Comments
Why it matters
The timing matters: swebench is converging with shifts in verified, which could amplify the downstream impact.