MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents 文章

ArXiv CS.CL2026-06-16NEWSen作者: Lawrence Keunho Jang, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov

详细信息

来源站点: ArXiv CS.CL
作者: Lawrence Keunho Jang, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2606.16748v1 Announce Type: cross Abstract: Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected to work across a user's whole digital life, including their context, historical data, and logged-in accounts. This gap is widest on web tasks, where live web evaluations cannot exercise sites that require logging in or personal information, the kind of site a real personal assistant has to drive. We introduce MyPCBench, which tests computer-use agents as personal assistants on a Linux desktop populated with 17 simulated real-world web applications and a full desktop stack, all seeded for one canonical persona, Michael Scott from The Office. We define 184 tasks in this environment, each inspired by a real request drawn from the OpenClaw community, and benchmark six closed and open-weight models with a uniform computer+bash tool surface.

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents 文章

详细信息

摘要

相关事件

相关公司查看全部 (1)

相关人物查看全部 (1)

相关产品查看全部 (2)

相关技术查看全部 (2)