Selective Token-Level Cryptographic Redaction for Privacy-Preserving Clinical Deployment of Large Language Models 文章

ArXiv CS.CL2026-06-03NEWSen作者: Farhan Sheth, Ziyuan Yang, Yongying Lan, Si Yong Yeo

摘要

arXiv:2606.03399v1 Announce Type: new Abstract: While large language models (LLMs) are increasingly used for clinical applications, many existing pipelines require sending raw sensitive health information to remote servers for processing, which heightens the risk of privacy leakage. A natural approach to mitigate this risk is to encrypt the data before transmission. However, straightforward solutions such as encrypting the entire dataset introduce prohibitive computational, alignment, and communication overheads, rendering large-scale practical deployment infeasible. To preserve privacy while maintaining usability, we present Healthcare Encryption & Redaction via Adaptive Linguistic Decomposition (HERALD), a token-level cryptographic redaction framework designed to achieve this balance by encrypting only sensitive tokens while preserving the surrounding context for downstream model utility.