Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy 事件

Name: Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy
Start: 2026-05-26

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy arXiv:2605.25603v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning improves the problem-solving ability of large language models (LLMs), but generated reasoning traces may not faithfully reflect the model's actual decision process. Existing CoT unfaithfulness detectors mainly rely on external signals from generated rationales, such as textual plausibility or answer consistency, while overlooking

人工智能

关系图谱

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy · 相关人物

L De