Alignment Risks from Capability-Seeking RL Training 事件

Name: Alignment Risks from Capability-Seeking RL Training
Start: 2026-06-05

REGULATION2026-06-05影响: MEDIUM

Alignment Risks from Capability-Seeking RL Training arXiv:2602.12124v2 Announce Type: replace-cross Abstract: While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk arises from capability-seeking RL training in vulnerable environments. We investigate whether language models, when trained with reinforcement learning (RL) in environments with implicit loopholes, can learn to exploit these flaws to maximize reward, even without

人工智能

关系图谱

Alignment Risks from Capability-Seeking RL Training 事件

相关公司查看全部 (10)

相关人物查看全部 (2)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)