Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges 事件

Name: Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges
Start: 2026-05-27

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges arXiv:2605.26156v1 Announce Type: cross Abstract: The known stylistic biases in LLM judges, such as a preference for verbosity or specific sentence structures, present an underexplored security vulnerability. In this work, we introduce BITE (BIas exploraTion and Exploitation), a black-box adversarial framework that learns semantics-preserving edits to mislead an LLM judge and artificially inflate the scores it assign

人工智能

关系图谱

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges 事件

相关公司查看全部 (10)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)