Event-to-Video Reconstruction using Spatio-Temporal and Frequency-Enhanced Deep Neural Networks 文章

ArXiv CS.CV2026-05-26NEWSen作者: Ramna Maqsood, Paulo Nunes, Lu\'is Ducla Soares, Caroline Conti

摘要

arXiv:2605.25804v1 Announce Type: new Abstract: Event cameras offer significant advantages over conventional frame-based counterparts, including high temporal resolution, low latency, and energy efficiency. These characteristics make them suitable for high-speed and high-dynamic range scene acquisition scenarios; however, the lack of dense intensity frames limits the direct applicability of conventional computer vision methods for scene understanding. Event-to-video (E2V) reconstruction seeks to bridge this gap by converting asynchronous event streams into a sequence of synchronous video frames. Existing E2V reconstruction methods based on convolutional neural networks and transformers operate primarily in the spatial domain and often struggle to recover fine structural details while suppressing severe reconstruction artifacts. To address these issues, we propose MSFET-E2V, a novel multiscale frequency-enhanced transformer model.