SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors 文章

ArXiv CS.CL2026-05-26NEWSen作者: Natalia Trukhina, Vadim Vashkelis

摘要

arXiv:2605.24541v1 Announce Type: cross Abstract: Text compression for large language model (LLM) systems is usually framed as token deletion, retrieval, summarization, or exact reconstruction. We study a more aggressive but explicitly lossy setting: compress text into compact codes that an LLM can expand into task-relevant meaning. We call this setting SemanticZip. Unlike lossless compression, SemanticZip does not require byte-identical reconstruction; unlike ordinary summarization, it treats model-based decompression as part of the codec and evaluates whether task-relevant semantic commitments are recovered. This paper is a pilot framework, not a benchmark claim. We formalize LLM-mediated decompression, define a protected/lossy packet architecture, and evaluate six representation regimes over five author-constructed diagnostic cases: structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, and SemanticZip emoji.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据