Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning arXiv:2606.02132v1 Announce Type: new Abstract: Agentic reinforcement learning can induce tool abuse, where models overuse external tools even for queries solvable by internal reasoning. Existing approaches mitigate this issue with uniform tool-use penalties or hard limits, which reduce tool frequency but may also suppress useful tool-assisted exploration. We propose EAPO, an Efficient Agentic Policy Optimization
Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning · 相关报道
相关报道
Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning
ArXiv CS.AI2026-06-02