详细信息
- 来源站点
- ArXiv CS.CV
- 作者
- Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Bj\"orn Ommer
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-06-01
摘要
arXiv:2605.31535v1 Announce Type: new Abstract: Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone, turning self-supervised NVS into a well-posed single-model scaling problem. A minimal dynamic state, treated as a nuisance factor, absorbs time-varying content and enables stable training on unconstrained real-world video. Importantly, RayDer keeps static-scene NVS as its target task: dynamic content is leveraged purely as scalable supervision, not reconstructed as in dynamic-scene (4D) NVS.