UCPO: Uncertainty-Aware Policy Optimization 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
UCPO: Uncertainty-Aware Policy Optimization arXiv:2601.22648v2 Announce Type: replace Abstract: The key to building trustworthy large language models (LLMs) lies in endowing them with inherent uncertainty expression capabilities, thereby mitigating overconfident errors in high-stakes applications. However, existing RL paradigms such as GRPO often suffer from Advantage Bias due to binary decision spaces and static uncertainty rewards, inducing either excessive conservatism or overconfidence. To