AAAI 2026 Oral
Our method effectively leverages geometric priors from pretrained 3D reconstruction model, achieving more plausible normal map recovery in challenging scenes with complex backgrounds and limited lighting variation. Compared to SoTA monocular normal prediction models (e.g., MoGe-2 (Wang et al. 2025c)), our approach captures finer surface details by incorporating multi-illumination cues.
Universal Photometric Stereo is a promising approach for recovering surface normals without strict lighting assumptions. However, it struggles when multi-illumination cues are unreliable, such as under biased lighting or in shadows or self-occluded regions of complex in-the-wild scenes. We propose GeoUniPS, a universal photometric stereo network that integrates synthetic supervision with high-level geometric priors from large-scale 3D reconstruction models pretrained on massive in-the-wild data. Our key insight is that these 3D reconstruction models serve as visual-geometry foundation models, inherently encoding rich geometric knowledge of real scenes. To leverage this, we design a Light-Geometry Dual-Branch Encoder that extracts both multi-illumination cues and geometric priors from the frozen 3D reconstruction model. We also address the limitations of the conventional orthographic projection assumption by introducing the PS-Perp dataset with realistic perspective projection to enable learning of spatially varying view directions. Extensive experiments demonstrate that GeoUniPS delivers state-of-the-arts performance across multiple datasets, both quantitatively and qualitatively, especially in the complex in-the-wild scenes.
Overview of our GeoUniPS architecture. Given multiple input images captured under different lighting conditions, the Light-Geometry Dual-Branch Encoder extracts both light-variant features from multi-illumination cues (EncoderIL) and geometric features from the pretrained VGGT aggregator (EncoderGeo). These features are concatenated with the input images using an MLP-based embedding, after which the Dual-Scale Normal Decoder performs pixel-wise normal regression at sampled locations.
@inproceedings{kmtam2026geounips,
title={Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo Under Limited Multi-Illumination Cues},
author={King-Man Tam and Satoshi Ikehata and Yuta Asano and Zhaoyi An and Rei Kawakami},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}