Efficient Adversarial Attacks on High-dimensional Offline Bandits 文章

ArXiv CS.AI2026-06-04NEWSen作者: Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah

摘要

arXiv:2602.01658v2 Announce Type: replace-cross Abstract: Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large language models, by efficiently identifying top-performing candidates without exhaustive comparisons. These methods typically rely on a reward model, often distributed with public weights on platforms such as Hugging Face, to provide feedback to the bandit. While online evaluation is expensive and requires repeated trials, offline evaluation with logged data has become an attractive alternative. However, the adversarial robustness of offline bandit evaluation remains largely unexplored, particularly when an attacker perturbs the reward model (rather than the training data) prior to bandit training. In this work, we fill this gap by investigating, both theoretically and empirically, the vulnerability of offline bandit training to adversarial manipulations of the reward model.

Efficient Adversarial Attacks on High-dimensional Offline Bandits 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (1)

相关人物

相关产品

相关技术查看全部 (4)