Improving End-to-End Speech Recognition for Dysarthric Speech through In-Domain Data Augmentation 文章

ArXiv CS.AI2026-06-19NEWSen作者: Paban Sapkota, Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Shrikanth Narayanan

详细信息

来源站点: ArXiv CS.AI
作者: Paban Sapkota, Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Shrikanth Narayanan
文章类型: NEWS
语言: en
发布日期: 2026-06-19

摘要

arXiv:2606.19797v1 Announce Type: cross Abstract: Dysarthric speech recognition is crucial for facilitating effective communication among individuals with dysarthria. However, accurately recognizing dysarthric speech poses significant challenges due to varying severity levels and limited data availability. In this paper, we explore data augmentation techniques for dysarthric automatic speech recognition (ASR) systems by fine-tuning the End-to-End pre-trained Wav2Vec2 model, with a specific focus on severity levels. To address the challenges of data scarcity and the need for extensive data in fine-tuning pre-trained ASR systems for dysarthric speech, we investigate four prominent data augmentation methods: Speaking-Rate Modification (SRM), Pitch Modification (PM), Formant Modification (FM), and vocal tract Length Perturbation (VTLP), tailored to different aspects of dysarthria. The study uses individually fine-tuned Wav2Vec2 models for each severity class as baseline systems.

Improving End-to-End Speech Recognition for Dysarthric Speech through In-Domain Data Augmentation 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (6)