Speaker Recognition for Multi-speaker Conversations Using X-vectors 论文

2019引用 313

Speech Recognition and SynthesisSpeech and Audio ProcessingMusic and Audio Processing

Speech Recognition and Synthesis Speech and Audio Processing Music and Audio Processing

作者

摘要

Recently, deep neural networks that map utterances to fixed-dimensional embeddings have emerged as the state-of-the-art in speaker recognition. Our prior work introduced x-vectors, an embedding that is very effective for both speaker recognition and diarization. This paper combines our previous work and applies it to the problem of speaker recognition on multi-speaker conversations. We measure performance on Speakers in the Wild and report what we believe are the best published error rates on this dataset. Moreover, we find that diarization substantially reduces error rate when there are multiple speakers, while maintaining excellent performance on single-speaker recordings. Finally, we introduce an easily implemented method to remove the domain-sensitive threshold typically used in the clustering stage of a diarization system. The proposed method is more robust to domain shifts, and achieves similar results to those obtained using a well-tuned threshold.

作者查看全部 (6)

Sanjeev Khudanpur

Daniel Povey

Alan McCree

Gregory Sell

Speaker Recognition for Multi-speaker Conversations Using X-vectors 论文

摘要

作者查看全部 (6)

相关技术查看全部 (3)

相关事件

相关文章