findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding 文章

ArXiv CS.CL2026-06-17NEWSen作者: H\'ector Javier V\'azquez Mart\'inez

详细信息

来源站点: ArXiv CS.CL
作者: H\'ector Javier V\'azquez Mart\'inez
文章类型: NEWS
语言: en
发布日期: 2026-06-17

摘要

arXiv:2603.26292v2 Announce Type: replace Abstract: Syllable-level units offer compact and linguistically meaningful representations for spoken language modeling and unsupervised word discovery, but research on syllabification remains fragmented across disparate implementations, datasets, and evaluation protocols. We introduce findsylls, a modular, language-agnostic toolkit that unifies classical syllable detectors and end-to-end syllabifiers under a common interface for syllable segmentation, embedding extraction, and multi-granular evaluation. The toolkit implements and standardizes widely used methods (e.g., Sylber, VG-HuBERT) and allows their components to be recombined, enabling controlled comparisons of representations, algorithms, and token rates.

findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (2)