SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems 事件

BREAKTHROUGH2026-06-04影响: HIGH

SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems arXiv:2605.10246v2 Announce Type: replace Abstract: AI scientist systems are increasingly deployed for autonomous research, yet their academic integrity has never been systematically evaluated. We introduce SCIINTEGRITY-BENCH, the first benchmark designed around a dilemmatic evaluation paradigm: each of its 33 scenarios across 11 trap categories is constructed so that honest acknowledgment of failure is th

SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems · 相关人物

暂无数据