BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language 事件
PRODUCT_LAUNCH2026-06-03影响: MEDIUM
BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language arXiv:2606.03504v1 Announce Type: new Abstract: We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune OpenAI Whisper-small on this corpus and repor
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
ArXiv CS.CL2026-06-03