The Sample Complexity of Multiclass and Sparse Contextual Bandits 文章

ArXiv CS.AI2026-05-29NEWSen作者: Liad Erez, Fan Chen, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran, Alexander Rakhlin

摘要

arXiv:2605.29645v1 Announce Type: cross Abstract: We study contextual bandits in the stochastic i.i.d.\ setting, where a learner observes contexts drawn from an unknown distribution, selects actions from a finite set $A$, and aims to identify an approximately optimal policy from a given class based on bandit feedback. Motivated by bandit multiclass classification with zero-one rewards, we focus on the \emph{$s$-sparse} setting in which, for every context, the reward vector has $L_1$-norm at most $s \ll |A|$. Our main result is the design of algorithms that, with high probability, output an $\epsilon$-optimal policy compared to policy class $\Pi$ using $\tilde{O} ((s/\epsilon^2 + |A|/\epsilon)\log |\Pi|/\delta)$ samples. We extend this bound to general Natarajan classes and complement it with a matching lower bound (up to logarithmic factors), thereby closing a substantial gap left by prior work (Erez et al., 2024, 2025), which incurred an additional $\Theta(|A|^9)$ dependence.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据