Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels 文章

ArXiv CS.CL2026-06-01NEWSen作者: Pedro H. L. Leite, Pedro Benevenuto Valadares, Luiz W. P. Biscainho

摘要

arXiv:2605.30457v1 Announce Type: cross Abstract: Regional accent classification in Brazilian Portuguese (pt-BR) suffers from the need for reliable labeling. While large self-supervised learning (SSL) speech models are powerful, their training pipelines dilute sociophonetic information, since accent labels are generally not reliable or are not used in training objectives. This work introduces a novel workflow for feature extraction using only acoustic labels. By isolating explicit regional accent landmarks and using a phoneme-based forced aligner (ZIPA), our targeted feature set captures dialectal variance more effectively than utterance embeddings, demonstrating that localized features can outperform general-purpose architectures on accent-related tasks using minimal and objective data labels.

相关公司

暂无数据

相关人物

暂无数据