CalArena: A Large-Scale Post-Hoc Calibration Benchmark 文章

ArXiv CS.AI2026-05-29NEWSen作者: Eug\`ene Berta, David Holzm\"uller, Francis Bach, Michael I. Jordan

摘要

arXiv:2605.30188v1 Announce Type: cross Abstract: Reliable probability estimates are critical in many machine learning applications, yet modern classifiers are often poorly calibrated. Post-hoc calibration provides a simple and widely used solution, but the large number of proposed methods, combined with small-scale and inconsistent evaluations, makes it difficult to determine which approaches are truly effective in practice. We introduce a large-scale, standardized benchmark for post-hoc calibration, covering nearly 2000 experiments across tabular and computer vision tasks, including binary, multiclass, and large-scale classification settings. Our benchmark aggregates predictions from a diverse set of classical models, modern deep learning architectures, and foundation models, and provides unified, reproducible implementations of dozens of calibration methods within a common evaluation framework.

CalArena: A Large-Scale Post-Hoc Calibration Benchmark 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (1)