Future-KL Regularized GRPO: Process-Level Credit Assignment from $f$-Divergence Regularization 文章

ArXiv CS.CL2026-05-26NEWSen作者: Jiarui Yao, Ruida Wang, Hao Bai, Tong Zhang

Future-KL Regularized GRPO: Process-Level Credit Assignment from $f$-Divergence Regularization · 相关人物

暂无数据