MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional 文章

ArXiv CS.AI2026-05-26NEWSen作者: Roberto Cruz, David Rey-Blanco

摘要

arXiv:2605.24699v1 Announce Type: new Abstract: Most reported gains on agentic-LLM clinical benchmarks are often attributed to prompt engineering, yet our results suggest that larger improvements can come from architectural and engine-level design. We present MDIA, a Multi-agent Diagnostic Intelligence Agent implemented as a 7-node specialty-routed clinical reasoning graph, on the full HealthBench Professional benchmark (n = 525), on a non-fine-tuned LLM. MDIA achieves 0.6272 under OpenAI's GPT-5.4-2026-03-05, which is +3.72 pp above the performance of OpenAI's ChatGPT for Clinicians. The experimental work shows that performance lift is attributable to system architecture: specialty routing, multi-turn context preservation, drug-state safety gating, site-filtered search, length-aware synthesis, and engine-level reliability. These findings support the view that agentic clinical benchmark performance is shaped both by the underlying foundation model and the orchestration architecture.

MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (2)

相关人物

相关产品查看全部 (12)

相关技术查看全部 (26)