3D-Aware Encoding for Style-based Neural Radiance Fields
We tackle the task of NeRF inversion for style-based neural radiance fields, (e.g., StyleNeRF). In the task, we aim to learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code. Compare...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
12-11-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We tackle the task of NeRF inversion for style-based neural radiance fields,
(e.g., StyleNeRF). In the task, we aim to learn an inversion function to
project an input image to the latent space of a NeRF generator and then
synthesize novel views of the original image based on the latent code. Compared
with GAN inversion for 2D generative models, NeRF inversion not only needs to
1) preserve the identity of the input image, but also 2) ensure 3D consistency
in generated novel views. This requires the latent code obtained from the
single-view image to be invariant across multiple views. To address this new
challenge, we propose a two-stage encoder for style-based NeRF inversion. In
the first stage, we introduce a base encoder that converts the input image to a
latent code. To ensure the latent code is view-invariant and is able to
synthesize 3D consistent novel view images, we utilize identity contrastive
learning to train the base encoder. Second, to better preserve the identity of
the input image, we introduce a refining encoder to refine the latent code and
add finer details to the output image. Importantly note that the novelty of
this model lies in the design of its first-stage encoder which produces the
closest latent code lying on the latent manifold and thus the refinement in the
second stage would be close to the NeRF manifold. Through extensive
experiments, we demonstrate that our proposed two-stage encoder qualitatively
and quantitatively exhibits superiority over the existing encoders for
inversion in both image reconstruction and novel-view rendering. |
---|---|
DOI: | 10.48550/arxiv.2211.06583 |