PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing 文章

ArXiv CS.CL2026-05-29NEWSen作者: Krzysztof \.Zurawicki, Julia Farganus, Arkadiusz Gawe{\l}, Mateusz Bystro\'nski, Tomasz Jan Kajdanowicz

查看原文 →

关系图谱

摘要

arXiv:2605.29815v1 Announce Type: cross Abstract: The growing number of submitted papers has motivated the exploration of Large Language Models (LLMs) as a means to support and augment the peer review process, particularly in terms of improving its speed and scalability. Yet, it remains unknown whether LLMs engage with scientific manuscripts in the same manner as human reviewers, or whether they merely produce review-looking text. To address this, we introduce the Peer Review AI Benchmark (PRAIB), a novel framework comprising thoroughly defined metrics that measure review specificity, style, and behavior of engagement. To complement the PRAIB framework, we conduct a large-scale empirical study leveraging a dataset of 11,000 reviews generated by five proprietary and open-source models for 1,000 ICLR and NeurIPS papers.

PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (1)