RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare 文章

ArXiv CS.AI2026-06-02NEWSen作者: Rohith Reddy Bellibatlu, Manpreet Singh, Yash Jajoo, Shyamal Lakhanpal, Abhishek Israni

摘要

arXiv:2605.12895v2 Announce Type: replace-cross Abstract: Clinical decision-support systems are expert systems whose recommendations clinicians act on directly, yet they are usually cleared on one aggregate accuracy number from a held-out test set. That number says nothing about input reliability under encoding shifts, subgroup gaps, threshold sensitivity, or operational feasibility. We present RISED, a pre-deployment evaluation framework operationalising five dimensions (Reliability, Inclusivity, Sensitivity, Equity, Deployability) through BCa bootstrap 95% confidence intervals, literature-grounded thresholds, and Holm-Bonferroni-corrected PASS / FAIL / INCONCLUSIVE verdicts; Equity is a proxy-dependence diagnostic rather than a gating test. Applied to seven cohorts spanning 35 years (n from 303 to 99,492), RISED surfaces failures invisible to AUROC: on Diabetes 130, Reliability passes by three orders of magnitude (PSS = 0.0004) while Inclusivity (AUC parity gap = 0.