TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents 事件

ACQUISITION2026-06-06影响: HIGH

TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents arXiv:2606.05784v1 Announce Type: new Abstract: We identify and formally characterize credit misassignment as a systematic failure mode of GRPO in tool-augmented multimodal search agents: its uniform broadcast of trajectory-level advantages to all tokens causes valuable tool-use steps in failing trajectories to be penalized no differently from valueless ones. We further empirically quantify the scale of this p

TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents · 相关技术