When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL 文章

ArXiv CS.AI2026-06-01NEWSen作者: Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu, Dingyan Shang

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL · 相关事件