Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This task, which is often applied to novel view synthesis, is recently revolution-ized by Neural Radiance Field (NeRF) for its state-of-the-art qu...

Full description

Saved in:
Bibliographic Details
Published in:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5449 - 5459
Main Authors: Sun, Cheng, Sun, Min, Chen, Hwann-Tzong
Format: Conference Proceeding
Language:English
Published: IEEE 01-01-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This task, which is often applied to novel view synthesis, is recently revolution-ized by Neural Radiance Field (NeRF) for its state-of-the-art quality and fiexibility. However, NeRF and its variants require a lengthy training time ranging from hours to days for a single scene. In contrast, our approach achieves NeRF-comparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU. We adopt a representation consisting of a density voxel grid for scene geometry and a feature voxel grid with a shallow network for complex view-dependent appearance. Modeling with explicit and discretized volume representations is not new, but we propose two simple yet non-trivial techniques that contribute to fast convergence speed and high-quality output. First, we introduce the post-activation interpolation on voxel density, which is capable of producing sharp surfaces in lower grid resolution. Second, direct voxel density optimization is prone to suboptimal geometry solutions, so we robustify the optimization process by imposing several priors. Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene. Code: https://github.com/sunset1995/DirectVoxGO.
ISSN:2575-7075
DOI:10.1109/CVPR52688.2022.00538