Low-Resource Safety Failures Are Action Failures, Not Representation Failures 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Low-Resource Safety Failures Are Action Failures, Not Representation Failures arXiv:2606.01196v1 Announce Type: new Abstract: Safety alignment learned in high-resource languages transfers poorly to low-resource languages. Models refuse harmful prompts in English but fail to refuse when the same prompts are translated into Swahili or Burmese. Adaptive steering methods like AdaSteer and CAST inherit this failure cross-lingually. We diagnose where transfer breaks down. Across Qwen2.5-7B, Gemma-2-9