EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning 文章

ArXiv CS.CL2026-06-17NEWSen作者: Ayesha Gull, Muhammad Usman Safder, Rania Elbadry, Fan Zhang, Veselin Stoyanov, Preslav Nakov, Zhuohan Xie

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.CL
作者: Ayesha Gull, Muhammad Usman Safder, Rania Elbadry, Fan Zhang, Veselin Stoyanov, Preslav Nakov, Zhuohan Xie
文章类型: NEWS
语言: en
发布日期: 2026-06-17

原文

摘要

arXiv:2511.01650v3 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning capabilities imperative. However, existing benchmarks such as MMLU, MATH, and HumanEval assess isolated cognitive skills, failing to capture the physically grounded reasoning central to engineering, where scientific principles, quantitative modeling, and practical constraints must converge. To enable verifiable process supervision in engineering, we introduce EngTrace, a symbolic benchmark built on 90 parameterized templates, each generating unique, contamination-resistant problem instances, spanning three major engineering branches, nine core domains, and 20 distinct areas, yielding 1,350 test cases that stress-test generalization across diverse physical scenarios.

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (7)

相关技术查看全部 (1)