FEM-Bench: A Structured Scientific Reasoning Benchmark for Evaluating Code-Generating LLMs 文章

ArXiv CS.AI2026-06-01NEWSen作者: Saeed Mohammadzadeh, Erfan Hamdi, Joel Shor, Emma Lejeune

FEM-Bench: A Structured Scientific Reasoning Benchmark for Evaluating Code-Generating LLMs · 相关技术