Biases in the Blind Spot: Detecting What LLMs Fail to Mention 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Biases in the Blind Spot: Detecting What LLMs Fail to Mention arXiv:2602.10117v5 Announce Type: replace-cross Abstract: Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these unverbalized biases. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets. In this work, we introduce a fully automa
相关产品查看全部 (10)
相关报道查看全部 (1)
Biases in the Blind Spot: Detecting What LLMs Fail to Mention
ArXiv CS.AI2026-06-01