AI & ML
impact 16
Evaluating whether AI models would sabotage AI safety research
Evaluating whether AI models would sabotage AI safety research arXiv:2604.24618v1 Announce Type: new Abstract: We evaluate the propensity of frontier models to sabotage or refuse to assist with safety research when depl…
Why it matters
A useful signal for anyone monitoring sabotage. The models factor makes this more consequential than it first appears.