Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges arXiv:2605.26156v1 Announce Type: cross Abstract: The known stylistic biases in LLM judges, such as a preference for verbosity or specific sentence structures, present an underexplored security vulnerability. In this work, we introduce BITE (BIas exploraTion and Exploitation), a black-box adversarial framework that learns semantics-preserving edits to mislead an LLM judge and artificially inflate the scores it assign
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges
ArXiv CS.AI2026-05-27