TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning arXiv:2605.25850v1 Announce Type: new Abstract: This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamically re-weights the abstention reward during Group Relative Policy Optimization (GRPO) training

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning · 相关报道