H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer 文章

ArXiv CS.CL2026-05-26NEWSen作者: Maryam Haghifam, Zifan He, Jason Cong, Yizhou Sun

摘要

arXiv:2605.24930v1 Announce Type: new Abstract: Transformer-based LLMs achieve strong results on many language tasks; however, long inputs remain challenging because context windows are finite, and prefill latency and memory grow rapidly with prompt length. Flat token-stream processing and chunk-based retrieval can therefore spend substantial computation and context budget on text unrelated to the query. Offline-indexed RAG additionally introduces external storage and index management overhead, and typically appends retrieved evidence as raw text, increasing prefill cost and latency. H^{2}MT makes long-context inference structure-aware: it builds a semantic hierarchy offline, computes a memory embedding for each node via bottom-up post-order aggregation, and routes queries coarse-to-fine at inference to prune irrelevant branches early.

H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)