On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits 文章

ArXiv CS.AI2026-05-26NEWSen作者: Yunlong Hou, Zixin Zhong, Vincent Y. F. Tan

摘要

arXiv:2605.25789v1 Announce Type: cross Abstract: We study a stochastic multi-armed bandit problem where an agent is granted a free exploration budget before regret accumulates, a setting not captured by the classic regret minimization or pure exploration paradigms. The goal is to design an adaptive policy that strategically explores the bandit instance in the initial free exploration phase and minimizes the cumulative regret in the subsequent phase. We formalize this regret minimization with free exploration problem and identify an interesting regime where the free exploration budget scales logarithmically with the time horizon. To quantify the amount of regret saved with high probability as a result of the availability of the free exploration phase, we introduce a novel set of policies known as $(\alpha,\beta)$-probably saving policies.