Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs 文章

ArXiv CS.CL2026-05-29NEWSen作者: Utsav Maskey, Mark Dras, Usman Naseem

Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs · 相关人物

暂无数据