Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use 文章

ArXiv CS.CL2026-05-28NEWSen作者: Abhijit Kumar, Zoey Wu, Mohit Suley

详细信息

来源站点: ArXiv CS.CL
作者: Abhijit Kumar, Zoey Wu, Mohit Suley
文章类型: NEWS
语言: en
发布日期: 2026-05-28

摘要

arXiv:2605.27788v1 Announce Type: cross Abstract: Humans know when to reach for help e.g. $347 \times 28$ warrants a calculator while $2+2$ does not. Language models do not. Prompt-based approaches can instruct a model when to invoke tools, but this scaffolding does not teach it to recognize the boundary of its own knowledge. RL approaches that assign a single outcome reward to the whole trajectory fare no better: trajectory-level credit cannot isolate which tool call in a successful episode actually helped, nor penalize unnecessary calls. We propose \textbf{CARL} (\textbf{C}ompetence-\textbf{A}ware \textbf{R}einforcement \textbf{L}earning), which trains a critic on the model's own rollouts to learn where parametric knowledge suffices and where it needs external help. By decomposing each rollout at natural tool-use boundaries (e.g.

Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)