Large Byte Model: Teaching Language Models About Compiled Code 事件

PRODUCT_LAUNCH2026-06-03影响: MEDIUM

Large Byte Model: Teaching Language Models About Compiled Code arXiv:2606.02834v1 Announce Type: cross Abstract: Malware analysis starts with the raw bytes of an executable program, and tools to "lift" these to higher-level representations, such as assembly, are expensive and subject to error. Large Language Models (LLMs) cannot process raw byte representations and answer questions about them. To this end, we present the first byte-native LLM. Based on a vocabulary expansion technique using a b