MedCTA: A Benchmark for Clinical Tool Agents 文章

ArXiv CS.CL2026-06-11NEWSen作者: Tajamul Ashraf, Hyewon Jeong, Fida Mohammad Thoker, Bernard Ghanem

详细信息

来源站点: ArXiv CS.CL
作者: Tajamul Ashraf, Hyewon Jeong, Fida Mohammad Thoker, Bernard Ghanem
文章类型: NEWS
语言: en
发布日期: 2026-06-11

摘要

arXiv:2606.11702v1 Announce Type: cross Abstract: To make clinically grounded decisions, medical AI agents are expected to go beyond simple recognition and be capable of tool retrieval, evidence acquisition, and integration. Existing benchmarks largely evaluate isolated perception or single-turn question answering, and therefore provide limited visibility into failures of planning, tool recruitment, and rollout reliability. We introduce MedCTA, a benchmark for evaluating medical tool agents on clinician-validated, step-implicit tasks grounded in realistic multimodal clinical inputs, including radiology images, pathology slides, and reports. MedCTA comprises 107 real-world clinical tasks with clinician-verified executable trajectories over 5 deployed tools, and supports process-aware evaluation of tool selection, argument validity, execution stability, trajectory fidelity, and outcome quality.

MedCTA: A Benchmark for Clinical Tool Agents 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (1)

相关技术