Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges arXiv:2605.26156v1 Announce Type: cross Abstract: The known stylistic biases in LLM judges, such as a preference for verbosity or specific sentence structures, present an underexplored security vulnerability. In this work, we introduce BITE (BIas exploraTion and Exploitation), a black-box adversarial framework that learns semantics-preserving edits to mislead an LLM judge and artificially inflate the scores it assign