Beyond Transcripts: A Renewed Perspective on Audio Chaptering 文章

ArXiv CS.CL2026-05-29NEWSen作者: Fabian Retkowski, Maike Z\"ufle, Thai Binh Nguyen, Jan Niehues, Alexander Waibel

摘要

arXiv:2602.08979v2 Announce Type: replace-cross Abstract: Audio chaptering, the task of segmenting long-form audio into coherent sections, is increasingly important for navigating podcasts, lectures, and videos. Despite its relevance, research remains limited and text-based, leaving key questions unresolved about leveraging audio information, handling ASR errors, and transcript-free evaluation. We address these gaps through three contributions: (1) a systematic comparison between text-based models with acoustic features, a novel audio-only architecture (AudioSeg) operating on learned audio representations, and multimodal LLMs; (2) empirical analysis of factors affecting performance, including transcript quality, acoustic features, duration, and speaker composition; and (3) formalized evaluation protocols contrasting transcript-dependent text-space protocols with transcript-invariant time-space protocols.

相关事件查看全部 (1)

相关公司

暂无数据

相关人物

暂无数据