Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics 文章

ArXiv CS.CL2026-06-03NEWSen作者: Luyang Zhang, Jingyan Li

摘要

arXiv:2606.02981v1 Announce Type: new Abstract: Best-of-$N$ inference scaling (drawing $N$ candidate answers from a language model and returning the one a reward model ranks highest) improves accuracy by an amount that varies across models, but predicting that amount in advance currently requires running the procedure end-to-end. Prior work links cheap statistics of a model's sampled outputs and validation-set correctness (how often samples agree, how diverse they are, how confident the model is, and where correct samples appear) to model behavior, but does not isolate which of these form a stable, compact predictor of best-of-$N$ gain. We fit ridge predictors on features computed from a single labeled validation-set sampling pass, use bootstrap-Lasso as a stability analysis of the candidate feature set, and give a concentration analysis with an explicit linear-approximation residual.

Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics 文章

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (3)