findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding 文章

ArXiv CS.CL2026-06-17NEWSen作者: H\'ector Javier V\'azquez Mart\'inez

详细信息

来源站点
ArXiv CS.CL
作者
H\'ector Javier V\'azquez Mart\'inez
文章类型
NEWS
语言
en
发布日期
2026-06-17

摘要

arXiv:2603.26292v2 Announce Type: replace Abstract: Syllable-level units offer compact and linguistically meaningful representations for spoken language modeling and unsupervised word discovery, but research on syllabification remains fragmented across disparate implementations, datasets, and evaluation protocols. We introduce findsylls, a modular, language-agnostic toolkit that unifies classical syllable detectors and end-to-end syllabifiers under a common interface for syllable segmentation, embedding extraction, and multi-granular evaluation. The toolkit implements and standardizes widely used methods (e.g., Sylber, VG-HuBERT) and allows their components to be recombined, enabling controlled comparisons of representations, algorithms, and token rates.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据