FrugalNeRF:
Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

1National Yang Ming Chiao Tung University,   2University of Illinois Urbana-Champaign,   3NVIDIA Research     * equal contribution
arXiv 2024
arXiv Code [comming soon]

FrugalNeRF turns just two images and 10 minutes into high-quality 3D scenes.

Abstract

Neural Radiance Fields (NeRF) face significant challenges in few-shot scenarios, primarily due to overfitting and long training times for high-fidelity rendering. Existing methods, such as FreeNeRF and SparseNeRF, use frequency regularization or pre-trained priors but struggle with complex scheduling and bias. We introduce FrugalNeRF, a novel few-shot NeRF framework that leverages weight-sharing voxels across multiple scales to efficiently represent scene details. Our key contribution is a cross-scale geometric adaptation scheme that selects pseudo ground truth depth based on reprojection errors across scales. This guides training without relying on externally learned priors, enabling full utilization of the training data. It can also integrate pre-trained priors, enhancing quality without slowing convergence. Experiments on LLFF, DTU, and RealEstate-10K show that FrugalNeRF outperforms other few-shot NeRF methods while significantly reducing training time, making it a practical solution for efficient and accurate 3D scene reconstruction.


Our proposed framework

(a) Our FrugalNeRF represents a scene with a pair of density and appearance voxels \((\mathbf{V}^\mathrm{D}, \mathbf{V}^\mathrm{A})\). For a better graphical illustration, we show only one voxel in the figure. (b) We sample rays from not only training input views \(\mathbf{r}_\mathrm{train}\) but also randomly sampled novel views \(\mathbf{r}_\mathrm{novel}\). (c) We then create \(L + 1\) multi-scale voxels by hierarchical subsampling, where lower-resolution voxels ensure global geometry consistency and reduce overfitting but suffer from representing detailed structures, while higher-resolution voxels capture fine details but may get stuck in the local minimum or generate floaters. (d) For the rays from training views \(\mathbf{r}_\mathrm{train}\), we enforce an MSE reconstruction loss \(\mathcal{L}_\mathrm{recon}\) between the volume-rendered RGB color \(\hat{C}^l\) and input RGB \(C\) at each scale \(l\). (e) We introduce a cross-scale geometric adaptation loss \(\mathcal{L}_\mathrm{adapt}\) for novel view rays \(\mathbf{r}_\mathrm{novel}\), warping the volume-rendered RGB to the nearest training view using predicted depth, calculating projection errors \(e^l\) at each scale, and using the depth with the minimum reprojection error as pseudo-ground-truth (pseudo-GT) for depth supervision. This adaptation involves rays from both training and novel views, though the figure only depicts novel view rays for clarity.



Baseline Comparisons on LLFF dataset


TensoRF

FrugalNeRF


Baseline method (left) vs FrugalNeRF (right). Scene trained on 2 views. Try selecting different methods and scenes!

LLFF/Fern LLFF/Flower LLFF/Horns LLFF/Leaves LLFF/Trex


Baseline Comparisons on DTU dataset


TensoRF

FrugalNeRF


Baseline method (left) vs FrugalNeRF (right). Scene trained on 2 views. Try selecting different methods and scenes!

DTU/scan82 DTU/scan114 DTU/scan21 DTU/scan41 DTU/scan55


Baseline Comparisons on RealEstate-10K dataset


TensoRF

FrugalNeRF


Baseline method (left) vs FrugalNeRF (right). Scene trained on 2 views. Try selecting different methods and scenes!

RealEstate/00000 RealEstate/00001 RealEstate/00001


Cross-scale geometric adaptation during training



Low-resolution voxels initially guide geometry learning, with higher resolutions contributing more over time. This enables autonomous frequency tuning and better generalization.



Effect of the multi-scale voxel color loss


w/o multi-scale voxel color loss
w/ multi-scale voxel color loss

With multi-scale voxel color loss, the model benefits from various levels of detail in the scene and leads to a better rendering result and geometry.



Effect of the cross-scale geometric adaptation


w/o cross-scale geometric adaptation
w/ cross-scale geometric adaptation

With cross-scale geometric adaptation, the model can determine the appropriate depth through reprojection errors across different scales, significantly reducing floaters and resulting in improved geometry.



Citation

Acknowledgements

This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2 and 113-2628-E-A49-023-. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.

The website template was borrowed from Michaël Gharbi and Ref-NeRF.