IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures 文章

ArXiv CS.CL2026-06-05NEWSen作者: David Gringras

摘要

arXiv:2604.07709v4 Announce Type: replace-cross Abstract: A heavily safety-trained model will hand a physician the full, patient-followable benzodiazepine taper and refuse it to the patient who needs it, over identical clinical facts; the knowledge is present either way. IatroBench measures that asymmetry across sixty pre-registered clinical scenarios and six frontier models (3,600 responses), scoring each on two axes, commission harm (what a response gets wrong) and omission harm (what it withholds), through a physician-authored structured evaluation validated by a second physician (weighted kappa 0.571, within-1 agreement 96%). Holding clinical content fixed and varying only whether the asker presents as patient or physician yields what we call identity-contingent withholding: all five testable models give the physician more (a decoupling gap of +0.38, p = 0.003; a 13.1-point fall in layperson hit rates on safety-colliding actions, p < 0.0001;

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据