Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy 事件

Name: Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy
Start: 2026-06-09

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy arXiv:2606.07929v1 Announce Type: new Abstract: Large language models (LLMs) are entering clinical practice based on benchmark accuracy that may fail to detect safety-relevant failure modes. Here we present AI-MASLD, a stress-audit framework that adapts the logic of metabolic stress testing from hepatology to the evaluation of clinical LLMs. Using 240 clinical cases across six narrative pertur

人工智能

关系图谱

Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy 事件

相关公司查看全部 (10)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)