AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens arXiv:2512.17375v2 Announce Type: replace-cross Abstract: LLM-as-a-Judge systems supply the reward signal in modern RLHF and RLVR pipelines, but their binary verdict reduces to a single linear readout F_gap on one hidden state. We show this readout is shallow enough that short, low-perplexity tokens flip the verdict from "No" to "Yes". These tokens are sampled from the judge's own next-token distribution at th