Teaching large language models to reason like expert diagnosticians 文章

ArXiv CS.CV2026-05-26NEWSen作者: Thomas A. Buckley, Riccardo Conci, Peter G. Brodeur, Jason Gusdorf, Sourik Beltr\'an, Bita Behrouzi, Byron Crowe, Jacob Dockterman, Muzzammil Muhammad, Sarah Ohnigian, Andrew Sanchez, James A. Diao, Aashna P. Shah, Daniel Restrepo, Eric S. Rosenberg, Andrew S. Lea, Emily Glanton, Kimberly LeBlanc, Undiagnosed Diseases Network, Marinka Zitnik, Scott H. Podolsky, Zahir Kanjee, Raja-Elie E. Abdulnour, Jacob M. Koshy, Adam Rodman, Arjun K. Manrai

摘要

arXiv:2509.12194v2 Announce Type: replace-cross Abstract: Differential diagnosis is an iterative process that integrates patient information with broader medical knowledge. Clinical case series such as the NEJM Clinicopathologic Conferences (CPCs), published continuously since 1923, feature expert physicians who demonstrate diagnostic reasoning to peers, and have been used for decades to evaluate AI. However, prior AI evaluations have largely focused on final diagnostic accuracy rather than nuanced clinical reasoning. Here, we introduce Dr. CaBot, an agentic AI system that emulates an expert diagnostician by generating written and narrated slide-based presentations from an initial case description alone. CaBot recently generated the first AI diagnosis published in the 100+ year history of the NEJM CPCs. In blinded evaluations, physicians misclassified the source of the differential (CaBot vs.