Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling 文章

ArXiv CS.CL2026-06-03NEWSen作者: Runpeng Dai, Tong Zheng, Rui Liu, Chengsong Huang, Hongtu Zhu

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling · 相关技术