Bayesian Inference for PCFGs via Markov Chain Monte Carlo 论文
2007Edinburgh Research Explorer (University of Edinburgh)引用 280
Natural Language Processing TechniquesAlgorithms and Data CompressionTopic Modeling
摘要
This paper presents two Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of probabilistic context free grammars (PCFGs) from terminal strings, providing an alternative to maximum-likelihood estimation using the Inside-Outside algorithm. We illustrate these methods by estimating a sparse grammar describing the morphology of the Bantu language Sesotho, demonstrating that with suitable priors Bayesian techniques can infer linguistic structure in situations where maximum likelihood methods such as the Inside-Outside algorithm only produce a trivial grammar. 1