SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors 文章

ArXiv CS.CL2026-05-26NEWSen作者: Natalia Trukhina, Vadim Vashkelis

详细信息

来源站点: ArXiv CS.CL
作者: Natalia Trukhina, Vadim Vashkelis
文章类型: NEWS
语言: en
发布日期: 2026-05-26

摘要

arXiv:2605.24541v1 Announce Type: cross Abstract: Text compression for large language model (LLM) systems is usually framed as token deletion, retrieval, summarization, or exact reconstruction. We study a more aggressive but explicitly lossy setting: compress text into compact codes that an LLM can expand into task-relevant meaning. We call this setting SemanticZip. Unlike lossless compression, SemanticZip does not require byte-identical reconstruction; unlike ordinary summarization, it treats model-based decompression as part of the codec and evaluates whether task-relevant semantic commitments are recovered. This paper is a pilot framework, not a benchmark claim. We formalize LLM-mediated decompression, define a protected/lossy packet architecture, and evaluate six representation regimes over five author-constructed diagnostic cases: structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, and SemanticZip emoji.

SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (4)