IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage 文章

ArXiv CS.AI2026-05-28NEWSen作者: Yuhan Li, Mingxu Zhang, Dazhong Shen, Ying Sun

摘要

arXiv:2605.28247v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a key technique for en- hancing LLM reasoning, yet its data ineffi- ciency remains a major bottleneck. Existing methods address this problem only partially, each missing at least one of subset-level cov- erage, verifier signal use, or interpretability. To address this gap, we present IRDS (Inter- pretable RLVR Data Selection), which selects RLVR training instances on a sparse autoen- coder (SAE) cluster basis so the selection itself is auditable on recognizable problem motifs. To select instances the model both fails on and can still learn from, we introduce a verifier- coupled coverage objective on the SAE basis and solve it by greedy log-determinant max- imization. Experiments on three instruction- tuned models and six math reasoning bench- marks show that IRDS achieves the highest overall accuracy, exceeding the strongest base- line by +3.9/+4.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据