Peak-Then-Collapse and the Four Interface Channels of Knowledge-Graph Tool Use 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
Peak-Then-Collapse and the Four Interface Channels of Knowledge-Graph Tool Use arXiv:2605.26037v1 Announce Type: new Abstract: We test the standard RLVR tool-use recipe -- GRPO on Qwen2.5-7B-Instruct -- on a deliberately minimal knowledge-graph tool API: four Freebase navigation verbs over Complex WebQuestions. Under a self-verifiable retrieval reward, the policy's tool-grounded answer rate climbs from $3.8\%$ to $9.6\%$ over 250 steps, then collapses to $0\%$ within a single 50-step window --
相关产品查看全部 (10)
相关报道查看全部 (1)
Peak-Then-Collapse and the Four Interface Channels of Knowledge-Graph Tool Use
ArXiv CS.CL2026-05-26