Alignment Risks from Capability-Seeking RL Training 事件

Name: Alignment Risks from Capability-Seeking RL Training
Start: 2026-06-05

REGULATION2026-06-05影响: MEDIUM

Alignment Risks from Capability-Seeking RL Training arXiv:2602.12124v2 Announce Type: replace-cross Abstract: While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk arises from capability-seeking RL training in vulnerable environments. We investigate whether language models, when trained with reinforcement learning (RL) in environments with implicit loopholes, can learn to exploit these flaws to maximize reward, even without

人工智能

关系图谱

Alignment Risks from Capability-Seeking RL Training 事件

Alignment Risks from Capability-Seeking RL Training · 相关报道

相关报道