Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks 文章

ArXiv CS.AI2026-06-03NEWSen作者: Malia Barker, Bishal Lakha, Edoardo Serra, Francesco Gullo

摘要

arXiv:2606.03606v1 Announce Type: cross Abstract: Large language models achieve strong performance on arithmetic reasoning benchmarks, and one common response to arithmetic brittleness is to delegate computation to code. Yet models are still often used in settings where they must reason directly from natural language, and trustworthy models should solve small-number arithmetic word problems without external tools. Prior work shows that LLMs are sensitive to numerical variation: a model may solve an original problem but fail on structurally similar variants requiring the same reasoning procedure with different numbers. We ask whether this fragility persists under a stricter setting involving small, schema-preserving numeric changes that retain the original reasoning program and avoid large-number stress tests. We introduce an automatic algorithm for generating numeric-remapping attacks on arithmetic word problems.

Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术