Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy arXiv:2605.25603v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning improves the problem-solving ability of large language models (LLMs), but generated reasoning traces may not faithfully reflect the model's actual decision process. Existing CoT unfaithfulness detectors mainly rely on external signals from generated rationales, such as textual plausibility or answer consistency, while overlooking