Speech recognition 论文

1997引用 299
Speech Recognition and Synthesis

摘要

Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, for such applications as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding, a subject covered in section 1.8. Speech recognition systems can be characterized by many parameters, some of the more important of which are shown in Figure 1.1. An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies and is much more difficult to recognize than speech read from script. Some systems require speaker enrollment—a user must provide samples of his or her speech before using them—whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of words, language models or artificial grammars 1 2 Chapter 1: Spoken Language Input are used to restrict the combination of words. The simplest language model can be specified as a finite-state network, where the permissible words following each word are explicitly given. More general language models

相关事件

暂无数据

相关文章

暂无数据