Biases in the Blind Spot: Detecting What LLMs Fail to Mention 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Biases in the Blind Spot: Detecting What LLMs Fail to Mention arXiv:2602.10117v5 Announce Type: replace-cross Abstract: Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these unverbalized biases. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets. In this work, we introduce a fully automa

Biases in the Blind Spot: Detecting What LLMs Fail to Mention · 相关人物