Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy arXiv:2605.25603v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning improves the problem-solving ability of large language models (LLMs), but generated reasoning traces may not faithfully reflect the model's actual decision process. Existing CoT unfaithfulness detectors mainly rely on external signals from generated rationales, such as textual plausibility or answer consistency, while overlooking

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy · 相关人物