GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training arXiv:2605.26184v1 Announce Type: cross Abstract: Hybrid post-training usually combines supervised fine-tuning and reinforcement learning, but fixed mixing schedules cannot adapt when the relative noise of the two signals changes over time. We propose GAC, a noise-aware controller that derives an adaptive mixing weight from online estimates of gradient variance and disagreement between the two training signals. The method adds smo