Research
impact 16
Benchmarking Misuse Mitigation Against Covert Adversaries
Benchmarking Misuse Mitigation Against Covert Adversaries arXiv:2506.06414v2 Announce Type: replace Abstract: Existing language model safety evaluations focus on overt attacks and low-stakes tasks. In reality, an attack…
Why it matters
Not an isolated event—benchmarking has been trending in this direction. The misuse connection makes it particularly relevant.