AI & ML
impact 16
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry arXiv:2604.27019v1 Announce Type: cross Abstract: Safety-aligned language models must refuse harmful requests without collapsing into broad over-refusal, but…
Why it matters
The timing matters: dynamic is converging with shifts in adversarial, which could amplify the downstream impact.