AI & ML
impact 16
Characterizing the Consistency of the Emergent Misalignment Persona
Characterizing the Consistency of the Emergent Misalignment Persona arXiv:2604.28082v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned…
Why it matters
The timing matters: characterizing is converging with shifts in consistency, which could amplify the downstream impact.