LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs 事件

Name: LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
Start: 2026-05-26

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs arXiv:2605.23965v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong performance on logical reasoning benchmarks, yet their reliability remains uncertain. Existing evaluations rely on static benchmarks, which fail to assess robustness under logically equivalent transformations and often overestimate reasoning capability. We propose LGMT (Logic-Grounded Metamorphic Testing), an oracle

人工智能

关系图谱

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs 事件

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs · 相关技术

相关技术