AI & ML
impact 16
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers arXiv:2604.25891v1 Announce Type: cross Abstract: Finetuning a language model can lead to emergent misalignment (Eā¦
Why it matters
The emergent community will be debating this. Pay attention to how misalignment players respond in the coming weeks.