Perceive-then-Plan: Layout-as-Policy for Monocular 3D Scene Layout Estimation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Junwei Zhou, Yu-Wing Tai

摘要

arXiv:2605.25326v1 Announce Type: new Abstract: Building structured 3D scene layouts from a single image requires reconciling visual observations with physical and spatial constraints, a challenge that is difficult to address with direct prediction alone. In this work, we formulate monocular 3D layout estimation as a perceive-then-plan problem with vision-language models, where a Perceiver first grounds the 3D objects and then a Planner iteratively refines the scene hypothesis through actions that improve physical plausibility while preserving consistency with the input image. We propose Layout-as-Policy (LaP), which casts the planning stage as a policy learning problem: 3D layouts are represented as structured states, and refined via discrete actions such as translation, rotation, and rescaling.

Perceive-then-Plan: Layout-as-Policy for Monocular 3D Scene Layout Estimation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (1)