Approximately Sufficient Statistics and Bayesian Computation 论文

2008Statistical Applications in Genetics and Molecular Biology引用 218
Machine Learning and AlgorithmsBayesian Modeling and Causal InferenceAdvanced Statistical Process Monitoring

摘要

The analysis of high-dimensional data sets is often forced to rely upon well-chosen summary statistics. A systematic approach to choosing such statistics, which is based upon a sound theoretical framework, is currently lacking. In this paper we develop a sequential scheme for scoring statistics according to whether their inclusion in the analysis will substantially improve the quality of inference. Our method can be applied to high-dimensional data sets for which exact likelihood equations are not possible. We illustrate the potential of our approach with a series of examples drawn from genetics. In summary, in a context in which well-chosen summary statistics are of high importance, we attempt to put the 'well' into 'chosen.'