Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation arXiv:2605.30000v1 Announce Type: new Abstract: Front-end web code has become a core product surface for every frontier LLM release, yet evaluating these interactive applications at development speed remains costly because human-judged leaderboards like Arena do not scale. Existing automated proxies typically lean on reference implementations, test suites, or rigid checklists, and tend to miss the reasoned synthesi
Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation · 相关报道
相关报道
Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation
ArXiv CS.AI2026-05-29