LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs arXiv:2605.23965v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong performance on logical reasoning benchmarks, yet their reliability remains uncertain. Existing evaluations rely on static benchmarks, which fail to assess robustness under logically equivalent transformations and often overestimate reasoning capability. We propose LGMT (Logic-Grounded Metamorphic Testing), an oracle

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs · 相关技术