Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning arXiv:2606.02132v1 Announce Type: new Abstract: Agentic reinforcement learning can induce tool abuse, where models overuse external tools even for queries solvable by internal reasoning. Existing approaches mitigate this issue with uniform tool-use penalties or hard limits, which reduce tool frequency but may also suppress useful tool-assisted exploration. We propose EAPO, an Efficient Agentic Policy Optimization

Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning · 相关产品