AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training 文章

ArXiv CS.CL2026-06-02NEWSen作者: Liu Qing, Ou Wu, Yi Du

AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training · 相关技术