When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations 事件

PRODUCT_LAUNCH2026-06-08影响: MEDIUM

When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations arXiv:2606.07237v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used in healthcare for tasks such as clinical question answering, diagnosis support, and report summarization. Despite their promise, these models remain highly sensitive to subtle prompt perturbations, both lexical and syntactic, posing serious risks in safety-critical clinical applications. In this study, we co

When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations · 相关技术