AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference 文章

ArXiv CS.AI2026-06-09NEWSen作者: Xuanzhe Li, Ziyan Weng, Zhiyu Zhu, Junhui Hou

摘要

arXiv:2606.07665v1 Announce Type: cross Abstract: Transformer inference increasingly depends on specialized compiler and runtime support, but real model graphs still require semantic decisions about which regions are worth specializing and which CUDA implementation families are plausible. We present AgentCompile, an LLM-guided CUDA inference compiler that uses LLM outputs only as advisory search metadata. Given compiler-derived region summaries and bounded candidate spaces, the LLM proposes semantic labels, candidate priorities, parameter hints, and risk annotations; the compiler materializes CUDA candidates through templates, checks interface and hardware constraints, validates candidates empirically, selects implementations by measured latency, and falls back when specialization is unsupported or unprofitable. In end-to-end autoregressive generation, AgentCompile averages 5.66x, 4.05x, and 4.26x speedup over PyTorch eager on Qwen3-1.7B, Qwen3-4B, and Llama-3.

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (5)

相关技术查看全部 (3)