CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs 文章

ArXiv CS.AI2026-05-29NEWSen作者: Chelsea Zou, Yiheng Yao, Selena She, Noah Goodman, Robert D. Hawkins

摘要

arXiv:2605.09823v2 Announce Type: replace-cross Abstract: Personal AI assistants are beginning to act as delegates with access to calendars, inboxes, and user preferences. Calendar scheduling makes the trust problem concrete: an assistant must coordinate with other assistants while deciding what to reveal about the person it represents. We introduce CalBench, a controlled benchmark for multi-agent calendar scheduling under private information. In each task, $N$ agents manage separate private calendars and schedule a stream of $M$ incoming meetings while minimizing disruption costs. Because no agent can inspect another agent's calendar, success requires language-mediated coordination rather than centralized planning. CalBench generates solvable scenarios with CP-SAT oracle solutions and decentralized non-LLM reference protocols, enabling evaluation of task success, excess cost, communication efficiency, burden fairness, and privacy leakage under matched information constraints.

相关公司

暂无数据

相关人物

暂无数据