Sparse Autoencoders for Interpretable Emotion Control in Text-to-Speech 文章

ArXiv CS.CL2026-06-02NEWSen作者: Hongfei Du, Jiacheng Shi, Sidi Lu, Gang Zhou, Ye Gao

摘要

arXiv:2606.01479v1 Announce Type: new Abstract: Integrating large language models (LLMs) into text-to-speech (TTS) systems has improved speech expressiveness, yet interpretable emotional control remains challenging. Existing approaches primarily rely on external conditioning or global activation steering, offering limited insight into the internal representations underlying emotional control. In this work, we analyze emotion-related variation in the semantic hidden states of LLM-based TTS models using sparse autoencoders (SAEs) to identify sparse latent features. Our analysis shows that emotional variation is distributed across multiple sparse latent features, while intervening on a small subset enables interpretable emotion control. Building on this observation, we introduce a feature-level intervention framework for bidirectional emotion induction and suppression without modifying backbone parameters.

Sparse Autoencoders for Interpretable Emotion Control in Text-to-Speech 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (8)