Detecting misbehavior in frontier reasoning models 文章

OpenAI Blog2025-03-10BLOGen

摘要

Frontier reasoning models exploit loopholes when given the chance. We show we can detect exploits using an LLM to monitor their chains-of-thought. Penalizing their “bad thoughts” doesn’t stop the majority of misbehavior—it makes them hide their intent.

Detecting misbehavior in frontier reasoning models 文章

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (2)