MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents 文章

ArXiv CS.AI2026-06-03NEWSen作者: Jia Yu, Zilong Wang, Xinyang Jiang, Dongsheng Li, Shuo Wang

摘要

arXiv:2606.03203v1 Announce Type: new Abstract: Computer-use agents could automate repetitive screen-based clinical work, but their reliability in medical graphical user interfaces remains largely unvalidated. Existing benchmarks focus on general web or desktop tasks and underrepresent medical software, which requires domain knowledge, exhibits markedly different UI design from mainstream applications, lacks public testing environments, and demands safety validation beyond task completion. We introduce MedCUA-Bench, an interactive benchmark for clinical computer-use agents. It covers 18 clinical scenarios across 10 medical domains, reconstructed from real product manuals and open-source medical systems to capture authentic clinical interfaces while avoiding licensing and privacy constraints.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据