Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't 文章

ArXiv CS.CL2026-06-01NEWSen作者: Anej Svete, William Merrill, Ryan Cotterell, Ashish Sabharwal

摘要

arXiv:2605.30523v1 Announce Type: cross Abstract: Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices. Padded transformers -- to whose input filler symbols such as ``...'' are appended -- emerge as a useful gadget for establishing equivalences to circuit classes by providing polynomial space for adaptive parallel computation. However, only a limited set of padded transformer idealizations has been studied, leaving open how robustly these equivalences hold under changes to attention type, model width, and uniformity. We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity.

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术