How reliable are LLMs when it comes to playing dice? 事件
PRODUCT_LAUNCH2026-06-08影响: MEDIUM
How reliable are LLMs when it comes to playing dice? arXiv:2606.07515v1 Announce Type: new Abstract: We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability problems. We constructed two datasets, respectively a set of standard exercises and a set of counterintuitive exercises, designed to trigger heuristic reasoning, and evaluated 8 state-of-the-art models, each tested with and without Chain-of-Thought prom
相关产品查看全部 (10)
相关报道查看全部 (1)
How reliable are LLMs when it comes to playing dice?
ArXiv CS.CL2026-06-08