LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks arXiv:2605.27375v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly acting as autonomous agents, but their continuous interaction with the environment can lead to in-context reward hacking (ICRH), a phenomenon where LLMs iteratively optimize their behavior to maximize proxy objectives, inadvertently producing harmful side effects. Existing defense methods are insufficient to address this