FrugalNeRF

FrugalNeRF
Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors

Cheng Sun ³
Yu-Lun Liu ¹

¹National Yang Ming Chiao Tung University, ²University of Illinois Urbana-Champaign, ³NVIDIA Research * equal contribution

CVPR 2025

arXiv Code Video

FrugalNeRF turns just two images and 10 minutes into high-quality 3D scenes.

Abstract

Neural Radiance Fields (NeRF) face significant challenges in few-shot scenarios, primarily due to overfitting and long training times for high-fidelity rendering. Existing methods, such as FreeNeRF and SparseNeRF, use frequency regularization or pre-trained priors but struggle with complex scheduling and bias. We introduce FrugalNeRF, a novel few-shot NeRF framework that leverages weight-sharing voxels across multiple scales to efficiently represent scene details. Our key contribution is a cross-scale geometric adaptation scheme that selects pseudo ground truth depth based on reprojection errors across scales. This guides training without relying on externally learned priors, enabling full utilization of the training data. It can also integrate pre-trained priors, enhancing quality without slowing convergence. Experiments on LLFF, DTU, and RealEstate-10K show that FrugalNeRF outperforms other few-shot NeRF methods while significantly reducing training time, making it a practical solution for efficient and accurate 3D scene reconstruction.

Video

Our proposed framework

(a) Our FrugalNeRF represents a scene with a pair of density and appearance voxels \((\mathbf{V}^\mathrm{D}, \mathbf{V}^\mathrm{A})\). For a better graphical illustration, we show only one voxel in the figure. (b) We sample rays from not only training input views \(\mathbf{r}_\mathrm{train}\) but also randomly sampled novel views \(\mathbf{r}_\mathrm{novel}\). (c) We then create \(L + 1\) multi-scale voxels by hierarchical subsampling, where lower-resolution voxels ensure global geometry consistency and reduce overfitting but suffer from representing detailed structures, while higher-resolution voxels capture fine details but may get stuck in the local minimum or generate floaters. (d) For the rays from training views \(\mathbf{r}_\mathrm{train}\), we enforce an MSE reconstruction loss \(\mathcal{L}_\mathrm{recon}\) between the volume-rendered RGB color \(\hat{C}^l\) and input RGB \(C\) at each scale \(l\). (e) We introduce a cross-scale geometric adaptation loss \(\mathcal{L}_\mathrm{adapt}\) for novel view rays \(\mathbf{r}_\mathrm{novel}\), warping the volume-rendered RGB to the nearest training view using predicted depth, calculating projection errors \(e^l\) at each scale, and using the depth with the minimum reprojection error as pseudo-ground-truth (pseudo-GT) for depth supervision. This adaptation involves rays from both training and novel views, though the figure only depicts novel view rays for clarity.

Baseline Comparisons on LLFF dataset

TensoRF

FrugalNeRF

Baseline method (left) vs FrugalNeRF (right). Scene trained on 2 views. Try selecting different methods and scenes!

Baseline Comparisons on DTU dataset

TensoRF

FrugalNeRF

Baseline method (left) vs FrugalNeRF (right). Scene trained on 2 views. Try selecting different methods and scenes!

Baseline Comparisons on RealEstate-10K dataset

TensoRF

FrugalNeRF

Baseline method (left) vs FrugalNeRF (right). Scene trained on 2 views. Try selecting different methods and scenes!

Cross-scale geometric adaptation during training

Low-resolution voxels initially guide geometry learning, with higher resolutions contributing more over time. This enables autonomous frequency tuning and better generalization.

Effect of the multi-scale voxel color loss

w/o multi-scale voxel color loss

w/ multi-scale voxel color loss

With multi-scale voxel color loss, the model benefits from various levels of detail in the scene and leads to a better rendering result and geometry.

Effect of the cross-scale geometric adaptation

w/o cross-scale geometric adaptation

w/ cross-scale geometric adaptation

With cross-scale geometric adaptation, the model can determine the appropriate depth through reprojection errors across different scales, significantly reducing floaters and resulting in improved geometry.

Citation

@article{lin2024frugalnerf, title={FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors}, author={Chin-Yang Lin and Chung-Ho Wu and Chang-Han Yeh and Shih-Han Yen and Cheng Sun and Yu-Lun Liu}, journal={CVPR}, year={2025} }

Acknowledgements

This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2 and 113-2628-E-A49-023-. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.

The website template was borrowed from Michaël Gharbi and Ref-NeRF.

FrugalNeRF Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors

FrugalNeRF turns just two images and 10 minutes into high-quality 3D scenes.

Abstract

Video

Our proposed framework

Baseline Comparisons on LLFF dataset

Baseline Comparisons on DTU dataset

Baseline Comparisons on RealEstate-10K dataset

Cross-scale geometric adaptation during training

Effect of the multi-scale voxel color loss

w/o multi-scale voxel color loss

w/ multi-scale voxel color loss

Effect of the cross-scale geometric adaptation

w/o cross-scale geometric adaptation

w/ cross-scale geometric adaptation

Citation

Acknowledgements

FrugalNeRF
Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors