Online Variational Inference for the Hierarchical Dirichlet Process 论文

2011引用 330
Bayesian Methods and Mixture ModelsStatistical Methods and Bayesian InferenceStatistical Methods and Inference

摘要

The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixed-membership data with a poten-tially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the compo-nents are distributions of terms that reflect recur-ring patterns (or “topics”) in the collection. Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions. One limitation of HDP analysis is that existing posterior infer-ence algorithms require multiple passes through all the data—these algorithms are intractable for very large scale applications. We propose an on-line variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data. Our algorithm is significantly faster than traditional inference algorithms for the HDP, and lets us analyze much larger data sets. We illustrate the approach on two large collections of text, showing improved performance over on-line LDA, the finite counterpart to the HDP topic model. 1