LLMs Need Encoders for Semantic IDs Too 文章

ArXiv CS.AI2026-06-02NEWSen作者: Xiangyi Chen, Zelun Wang, Xinyi Li, Yi-Ping Hsu, Jaewon Yang, Jiajing Xu

摘要

arXiv:2606.00324v1 Announce Type: cross Abstract: Multimodal LLMs use dedicated encoders to bridge non-language modalities (vision encoders for images, depth models for audio codec tokens) because raw token embeddings alone cannot capture modality-specific structure. We argue that Semantic IDs (SIDs), the hierarchical codes used in generative recommendation, constitute another such modality: a SID level token's meaning depends on its prefix context, yet current systems simply add SID tokens to the vocabulary and rely on training to learn these context-dependent meanings from scratch. We propose PrefixMem, a lightweight SID encoder based on prefix n-gram memory tables that provides the LLM with structured, prefix-conditioned representations at SID token positions. Like vision encoders in multimodal LLMs, PrefixMem can be pre-trained independently and then attached to any LLM for joint training.

LLMs Need Encoders for Semantic IDs Too 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (3)