GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations arXiv:2605.07053v2 Announce Type: replace Abstract: Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most robustness variants apply surface-level perturbations (paraphrases, renamings, number swaps, distractors) that largely preserve the underlying facts, and static releases can themselves become memor

GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations · 相关公司

S
SURFCOMPANY
A
arXivNONPROFIT
T
TERINONPROFIT
F
FrameworkCOMPANY
E
EleaCOMPANY
A
ACTNONPROFIT
M
MoriCOMPANY
R
RatioRESEARCH_INSTITUTE