Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System 论文

2018引用 325
Speech Recognition and Synthesis

摘要

In this paper, we explore the encoding/pooling layer and loss function in the end-to-end speaker and language recognition system.First, a unified and interpretable end-to-end system for both speaker and language recognition is developed.It accepts variable-length input and produces an utterance level result.In the end-to-end system, the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation.Besides the basic temporal average pooling, we introduce a self-attentive pooling layer and a learnable dictionary encoding layer to get the utterance level representation.In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system.Experimental results on Voxceleb and NIST LRE 07 datasets show that the performance of end-to-end learning system could be significantly improved by the proposed encoding layer and loss function.

相关事件

暂无数据

相关文章

暂无数据