Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability 文章

ArXiv CS.CL2026-05-28NEWSen作者: Leizhen Zhang, Shuhan Chen, Sheng Chen

摘要

arXiv:2605.28602v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used for tasks that implicitly reduce to Boolean satisfiability (SAT), yet their reasoning ability on SAT remains unclear. We present a systematic study of LLMs on 2-SAT and 3-SAT, together with two canonical reductions, Vertex Cover and discrete 3D packing, to probe representation-invariant reasoning. We first evaluate models using conventional metrics, including accuracy, precision, recall, and F1, as well as the SAT phase-transition setting. We find that these metrics can be misleading: many models obtain high scores by over-predicting satisfiable formulas, fail to reproduce the classical easy-hard-easy signature around the 3-SAT threshold, and degrade sharply as the number of variables grows.

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (5)