Bridging What the Model Thinks and How It Speaks: Expressive Speech Generation via Self-Aware Intent-Realization Alignment 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Bridging What the Model Thinks and How It Speaks: Expressive Speech Generation via Self-Aware Intent-Realization Alignment arXiv:2604.11424v2 Announce Type: replace Abstract: Speech Language Models (SLMs) exhibit strong semantic understanding, yet often fail to translate this capacity into expressive acoustic realization, producing speech with flattened prosody and misaligned emotion. We identify this mismatch as the semantic understanding-acoustic realization gap. Existing approaches typically