LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs 文章

ArXiv CS.AI2026-05-26NEWSen作者: Zenghui Zhou, Man Li, Xiaoke Fang, Xinyi Zhou, Weibin Li, Zheng Zheng

摘要

arXiv:2605.23965v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong performance on logical reasoning benchmarks, yet their reliability remains uncertain. Existing evaluations rely on static benchmarks, which fail to assess robustness under logically equivalent transformations and often overestimate reasoning capability. We propose LGMT (Logic-Grounded Metamorphic Testing), an oracle-free framework that leverages first-order logic (FOL) to evaluate LLM reasoning. By deriving metamorphic relations from formal logical equivalences, LGMT constructs semantically invariant test cases and detects reasoning defects through cross-case consistency checking. Experiments on six state-of-the-art LLMs show that LGMT exposes substantial hidden defects missed by traditional reference-based evaluations.

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (5)