Do not be greedy, Think Twice: Sampling and Selection for Document-level Information Extraction 文章

ArXiv CS.CL2026-05-29NEWSen作者: Mikel Zubillaga, Oscar Sainz, Oier Lopez de Lacalle, Eneko Agirre

摘要

arXiv:2601.18395v2 Announce Type: replace Abstract: Document-level Information Extraction (DocIE) aims to produce an output template with the entities, relations, and events of interest occurring in the given document. Standard practices include prompting decoder-only LLMs using greedy decoding to avoid output variability. Rather than treating this variability as a limitation, we show that sampling can produce substantially better solutions than greedy decoding, especially when using reasoning models. We thus propose ThinkTwice, a sampling and selection framework in which the LLM generates multiple candidate templates for a given document, and a selection module chooses the most suitable one. We introduce both an unsupervised method that exploits agreement across generated outputs, and a supervised selection method using reward models trained on labeled DocIE data.

Do not be greedy, Think Twice: Sampling and Selection for Document-level Information Extraction 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (2)