Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Abstract

Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes.

MALD-NeRF

Our per-scene customization effectively forges the latent diffusion model to synthesize consistent and in-context contents across views.

We present the results on the SPIn-NeRF and LLFF datasets. Note that the LLFF dataset does not have ground-truth views with object being physically removed, therefore, we only measures C-FID and C-KID on these scenes. The best performance is underscored.

Scene:

#	LPIPS	Adversarial	Others	FID (↓)	KID (↓)
A	✅			192.86	0.0447
B	✅	✅		185.79	0.0419
C		✅	w/o Per-Scene Customization	224.29	0.0596
D		✅	w/o Feature Matching	232.28	0.0716
E		✅	w/o Adv Masking	196.47	0.0472
Ours (full)		✅		183.25	0.0397

Taming Latent Diffusion Model for
Neural Radiance Field Inpainting

Abstract

MALD-NeRF

Per-Scene customization.

Evaluation on SPIn-NeRF and LLFF Datasets.

Comparing with Related Methods

Example Image

Example Mask

Ours

Ablation Study

Ours (full)

Ablation Variant A

BibTeX