Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning arXiv:2505.18603v2 Announce Type: replace-cross Abstract: Document understanding aims to perform question answering and information extraction over document images, where the visual content is highly information-dense and most queries rely on only a few relevant layout regions. However, existing methods either adopt a one-pass strategy that implicitly assumes all layouts are equally important, or focus excessively on