Bridging What the Model Thinks and How It Speaks: Expressive Speech Generation via Self-Aware Intent-Realization Alignment 文章

ArXiv CS.CL2026-06-02NEWSen作者: Kuang Wang, Lai Wei, Ping Lin, Qibing Bai, Wenkai Fang, Li Zhou, Feng Jiang, Zhongjie Jiang, Jun Huang, Yannan Wang, Haizhou Li

查看原文 →

关系图谱

摘要

arXiv:2604.11424v2 Announce Type: replace Abstract: Speech Language Models (SLMs) exhibit strong semantic understanding, yet often fail to translate this capacity into expressive acoustic realization, producing speech with flattened prosody and misaligned emotion. We identify this mismatch as the semantic understanding-acoustic realization gap. Existing approaches typically rely on externally specified proxies, such as emotion labels or style prompts, which require annotations and struggle to capture dynamically evolving expressive intent throughout dialogue. To overcome these limitations, we propose SASLM (Self-Aware Speech Language Model), a proxy-free framework that bridges what the model thinks and how it speaks through self-aware intent-realization alignment: (1) Intent-Aware Bridging self-distills expressive intent from the model's own evolving semantic generation states via a Variational Information Bottleneck (VIB), thereby guiding expressive speech realization without…

摘要可能不完整，可查看原文

Bridging What the Model Thinks and How It Speaks: Expressive Speech Generation via Self-Aware Intent-Realization Alignment 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (2)