DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention 文章

ArXiv CS.AI2026-06-06NEWSen作者: Najmul Hasan, Prashanth BusiReddyGari

摘要

arXiv:2602.13255v2 Announce Type: replace Abstract: We present DPBench, a benchmark for evaluating coordination in multi-agent systems built from large language models. Existing benchmarks measure task-level success under a fixed protocol; the structural conditions under which coordination succeeds or fails at all have not been characterised. DPBench adapts the Dining Philosophers problem into a controlled testbed where the action protocol, the communication structure, and the group size each vary independently. We evaluate six agents: GPT-5.2, Claude Opus 4.5, Grok 4.1, Gemini 2.5 Flash, Llama 4 Maverick, and a uniform-random baseline. Under simultaneous action at N=5 with the default prompt, deadlock ranges from 25.0% (95% Wilson CI [11.2, 46.9]) for GPT-5.2 to 90.0% [74.4, 96.5] for Gemini 2.5 Flash; sequential action is solved by four of the six. Holding the model fixed at Gemini 2.

DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (12)

相关技术查看全部 (1)