CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards 文章

ArXiv CS.CL2026-06-02NEWSen作者: Wei Tian, Yuhao Zhou, Man Lan

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards · 相关技术