A quantitative uncertainty metric controls error in neural network-driven chemical discovery 论文

2019Chemical Science引用 226顶会
Machine Learning in Materials ScienceComputational Drug Discovery MethodsMetabolomics and Mass Spectrometry Studies

摘要

, feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.