Applying the harmonic plus noise model in concatenative speech synthesis 论文

2001IEEE Transactions on Speech and Audio Processing引用 325
Speech Recognition and SynthesisPhonetics and Phonology ResearchSpeech and dialogue systems

摘要

This paper describes the application of the harmonic plus noise model (HNM) for concatenative text-to-speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of a speech signal into these two components allows for more natural-sounding modifications of the signal (e.g., by using different and better adapted schemes to modify each component). The parametric representation of speech using HNM provides a straightforward way of smoothing discontinuities of acoustic units around concatenation points. Formal listening tests have shown that HNM provides high-quality speech synthesis while outperforming other models for synthesis (e.g., TD-PSOLA) in intelligibility, naturalness, and pleasantness.

相关事件

暂无数据

相关文章

暂无数据