Cybersecurity
impact 16
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
RepIt: Steering Language Models with Concept-Specific Refusal Vectors arXiv:2509.13281v5 Announce Type: replace Abstract: Current safety evaluations of language models rely on benchmark-based assessments that may miss lā¦
Why it matters
The models community will be debating this. Pay attention to how language players respond in the coming weeks.