Federated learning improves site performance in multicenter deep learning without data sharing 论文

2020Journal of the American Medical Informatics Association引用 225
Privacy-Preserving Technologies in DataArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare

摘要

OBJECTIVE: To demonstrate enabling multi-institutional training without centralizing or sharing the underlying physical data via federated learning (FL). MATERIALS AND METHODS: Deep learning models were trained at each participating institution using local clinical data, and an additional model was trained using FL across all of the institutions. RESULTS: We found that the FL model exhibited superior performance and generalizability to the models trained at single institutions, with an overall performance level that was significantly better than that of any of the institutional models alone when evaluated on held-out test sets from each institution and an outside challenge dataset. DISCUSSION: The power of FL was successfully demonstrated across 3 academic institutions while avoiding the privacy risk associated with the transfer and pooling of patient data. CONCLUSION: Federated learning is an effective methodology that merits further study to enable accelerated development of models across institutions, enabling greater generalizability in clinical use.