Distributed Training of Neural Radiance Fields: A Performance Characterization

Implicit neural representation is an emerging method that leverages deep neural networks and learned parameters to represent 3D scenes efficiently and accurately. Neural radiance field (NeRF) is a state-of-art implicit representation that achieves photorealistic 3D reconstruction with compact neural...

Full description

Saved in:
Bibliographic Details
Published in:2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 319 - 321
Main Authors: Zhao, Adrian, Zhang, Louis, Durvasula, Sankeerth, Chen, Fan, Jain, Nilesh, Panneer, Selvakumar, Vijaykumar, Nandita
Format: Conference Proceeding
Language:English
Published: IEEE 05-05-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Implicit neural representation is an emerging method that leverages deep neural networks and learned parameters to represent 3D scenes efficiently and accurately. Neural radiance field (NeRF) is a state-of-art implicit representation that achieves photorealistic 3D reconstruction with compact neural network models. However, as the complexity and scale of the scene increase, training NeRF models with a single GPU proves insufficient for achieving fast training and high-quality reconstruction. To address this challenge, prior works proposed distributed NeRF training methods. This is the first work to conduct a detailed evaluation of two major distributed NeRF training methods and their tradeoffs: distributed data parallel (DDP) and spatial segmentation (SS). We find that DDP training requires cross-device synchronization during training, while SS training incurs additional fusion overhead during inference. Our analysis also reveals that sampling input images is a common key bottleneck in distributed NeRF training. At the beginning of each training iteration, the CPU generates input batches for all GPUs in the cluster by sampling all images in the dataset, causing significant stalls that constitute up to 43.3% of the total training time. To alleviate this bottleneck, we propose a pipelined input sampling strategy that precomputes input samples on the CPU concurrently with model training on the GPUs. Our evaluation demonstrates an average speedup in training time by 1.95\times( up to 2.24\times) .
ISSN:2766-0486
DOI:10.1109/ISPASS61541.2024.00044