AI Rater Discrimination Depends on Scoring Protocol in Complex Clinical Decision-Making 事件

OPEN_SOURCE2026-06-03影响: MEDIUM

AI Rater Discrimination Depends on Scoring Protocol in Complex Clinical Decision-Making arXiv:2606.03198v1 Announce Type: new Abstract: Clinical AI evaluation increasingly delegates scoring to large language models (LLMs) acting as AI raters, yet their scoring behavior across evaluation conditions has not been quantitatively characterized. We address this gap through a factorial study of AI rater behavior in adult type 2 diabetes (T2D) pharmacotherapy at 12-month outpatient follow-up, a clinica