EpiCurveBench: Evaluating VLMs on Epidemic Curve Digitization 文章

ArXiv CS.CL2026-05-27NEWSen作者: Thomas Berkane, Maimuna S. Majumder

摘要

arXiv:2605.27195v1 Announce Type: new Abstract: Chart-to-data extraction with vision-language models (VLMs) is increasingly evaluated on benchmarks that show diminishing headroom (frontier VLMs exceed 89% on ChartQA) and with metrics that treat extracted points as unordered key-value pairs, ignoring the temporal structure of time series and penalizing small alignment shifts as catastrophic failures. We address both gaps with EpiCurveBench, a benchmark of 1,000 real-world epidemic curve images curated from diverse public-health sources, and EpiCurveSimilarity (ECS), an evaluation metric that aligns predicted and ground-truth series via dynamic programming, tolerating local temporal shifts and gaps while penalizing them proportionally. Evaluating six methods--three frontier closed VLMs, one open VLM, and two specialized chart-extraction systems--we find the strongest model reaches only 52.