BullingerDB: A Dataset for Handwritten Text Recognition and Writer Retrieval 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

BullingerDB: A Dataset for Handwritten Text Recognition and Writer Retrieval arXiv:2605.30235v1 Announce Type: new Abstract: We present BullingerDB, a large-scale benchmark dataset for historical document analysis based on the correspondence of Heinrich Bullinger (1504-1575). The corpus comprises 20,898 pages and 499,222 text lines written by 796 writers over six decades, featuring stylistic variation, multilingual content (mostly Latin and Early New High German) as well as meta-information suc