A sticky HDP-HMM with application to speaker\n diarization 论文

2011Project Euclid (Cornell University)引用 228

Bayesian Methods and Mixture ModelsSpeech Recognition and SynthesisGaussian Processes and Bayesian Inference

Bayesian Methods and Mixture Models Gaussian Processes and Bayesian Inference Speech Recognition and Synthesis

作者

摘要

We consider the problem of speaker diarization, the problem\nof segmenting an audio recording of a meeting into temporal\nsegments corresponding to individual speakers. The problem is\nrendered particularly difficult by the fact that we are not\nallowed to assume knowledge of the number of people\nparticipating in the meeting. To address this problem, we take a\nBayesian nonparametric approach to speaker diarization that\nbuilds on the hierarchical Dirichlet process hidden Markov model\n(HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc.\n101 (2006) 1566–1581]. Although the basic HDP-HMM tends\nto over-segment the audio data—creating redundant states and\nrapidly switching among them—we describe an augmented HDP-HMM\nthat provides effective control over the switching rate. We also\nshow that this augmentation makes it possible to treat emission\ndistributions nonparametrically. To scale the resulting\narchitecture to realistic diarization problems, we develop a\nsampling algorithm that employs a truncated approximation of the\nDirichlet process to jointly resample the full state sequence,\ngreatly improving mixing rates. Working with a benchmark NIST\ndata set, we show that our Bayesian nonparametric architecture\nyields state-of-the-art speaker diarization results.

作者

暂无数据

A sticky HDP-HMM with application to speaker\n diarization 论文

摘要

作者

相关技术查看全部 (3)

相关事件

相关文章