LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment 文章

ArXiv CS.AI2026-06-04NEWSen作者: Kwok Leong Tang

摘要

arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton catalogs. Records enter only when at least two independent cataloging agencies assigned LCSH; we release per-catalog provenance plus union and unanimous answer views. A concordance study of 465,187 works cataloged by all three libraries shows why this design matters: libraries usually agree on the underlying topic (93.3% share a concept-level heading) but often differ in exact expression (39.4% have identical heading sets). LCSHBench therefore scores both exact and concept matches, with set and rank metrics broken down by language and heading type, across open-vocabulary generation and full-vocabulary retrieval.

相关公司查看全部 (3)

P
PrincetonUNIVERSITY
C
ColumbiaUNIVERSITY
H
HarvardUNIVERSITY

相关人物

暂无数据

相关技术

暂无数据