VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents 文章

ArXiv CS.CV2026-05-27NEWSen作者: JunJia Guo (Joe), Yuhang Yao (Joe), Jiawei (Joe), Zhou, Jingdi Chen

摘要

arXiv:2605.26144v1 Announce Type: cross Abstract: We present VISTA (VIsual Spec-To-App Benchmark), a benchmark for evaluating the end-to-end web-app generation capabilities of LLM-based agents. Unlike prior code generation benchmarks that focus on algorithmic tasks, VISTA targets realistic UI-centric development, where agents must produce functional, visually coherent applications from underspecified inputs. We define five prompt-information conditions that vary along two axes, visual/structural fidelity and stack constraint: (1) text only with free stack choice, (2) text with reference screenshots under three specified stacks, (3) text with reference screenshots under free stack choice, (4) text with screenshots and pruned Figma structure under a single specified stack, and (5) text with screenshots and pruned Figma structure under free stack choice.