SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL arXiv:2512.04069v2 Announce Type: replace Abstract: Vision Language Models (VLMs) demonstrate strong qualitative visual understanding, but struggle with metrically precise spatial reasoning required for embodied applications. The agentic paradigm promises that VLMs can use a wide variety of tools that could augment these capabilities, such as depth estimators, segmentation models, and pose estimators. Yet it remains an open