Perceive-then-Plan: Layout-as-Policy for Monocular 3D Scene Layout Estimation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Junwei Zhou, Yu-Wing Tai

摘要

arXiv:2605.25326v1 Announce Type: new Abstract: Building structured 3D scene layouts from a single image requires reconciling visual observations with physical and spatial constraints, a challenge that is difficult to address with direct prediction alone. In this work, we formulate monocular 3D layout estimation as a perceive-then-plan problem with vision-language models, where a Perceiver first grounds the 3D objects and then a Planner iteratively refines the scene hypothesis through actions that improve physical plausibility while preserving consistency with the input image. We propose Layout-as-Policy (LaP), which casts the planning stage as a policy learning problem: 3D layouts are represented as structured states, and refined via discrete actions such as translation, rotation, and rescaling.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据