RenoBench: A Citation Parsing Benchmark 文章

ArXiv CS.CL2026-06-02NEWSen作者: Parth Sarin, Juan Pablo Alperin, Adam Buttrick, Dione Mentis

摘要

arXiv:2603.25640v2 Announce Type: replace-cross Abstract: Accurate parsing of citations is necessary for machine-readable scholarly infrastructure. But, despite sustained interest in this problem, existing evaluation techniques are often not generalizable, based on synthetic data, or not publicly available. We introduce RenoBench, a public domain benchmark for citation parsing, sourced from PDFs released on four publishing ecosystems: SciELO, Redalyc, the Public Knowledge Project, and Open Research Europe. Starting from 161,000 annotated citations, we apply automated validation and feature-based sampling to produce a dataset of 10,000 citations spanning multiple languages, publication types, and platforms. We then evaluate a variety of citation parsing systems and report field-level precision and recall. Our results show strong performance from language models, particularly when fine-tuned.

相关事件查看全部 (1)

RenoBench: A Citation Parsing Benchmark
2026-06-02PRODUCT_LAUNCH影响: MEDIUM

相关人物

暂无数据

相关技术

暂无数据