Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo Under Limited Multi-Illumination Cues

AAAI 2026 Oral

King-Man Tam1, Satoshi Ikehata2,3, Yuta Asano2, Zhaoyi An1, Rei Kawakami1
1Institute of Science Tokyo, 2National Institute of Informatics, 3Denso IT Laboratory

Overview

GeoUniPS teaser

Our method effectively leverages geometric priors from pretrained 3D reconstruction model, achieving more plausible normal map recovery in challenging scenes with complex backgrounds and limited lighting variation. Compared to SoTA monocular normal prediction models (e.g., MoGe-2 (Wang et al. 2025c)), our approach captures finer surface details by incorporating multi-illumination cues.

Abstract

Universal Photometric Stereo is a promising approach for recovering surface normals without strict lighting assumptions. However, it struggles when multi-illumination cues are unreliable, such as under biased lighting or in shadows or self-occluded regions of complex in-the-wild scenes. We propose GeoUniPS, a universal photometric stereo network that integrates synthetic supervision with high-level geometric priors from large-scale 3D reconstruction models pretrained on massive in-the-wild data. Our key insight is that these 3D reconstruction models serve as visual-geometry foundation models, inherently encoding rich geometric knowledge of real scenes. To leverage this, we design a Light-Geometry Dual-Branch Encoder that extracts both multi-illumination cues and geometric priors from the frozen 3D reconstruction model. We also address the limitations of the conventional orthographic projection assumption by introducing the PS-Perp dataset with realistic perspective projection to enable learning of spatially varying view directions. Extensive experiments demonstrate that GeoUniPS delivers state-of-the-arts performance across multiple datasets, both quantitatively and qualitatively, especially in the complex in-the-wild scenes.

Method

GeoUniPS pipeline

Overview of our GeoUniPS architecture. Given multiple input images captured under different lighting conditions, the Light-Geometry Dual-Branch Encoder extracts both light-variant features from multi-illumination cues (EncoderIL) and geometric features from the pretrained VGGT aggregator (EncoderGeo). These features are concatenated with the input images using an MLP-based embedding, after which the Dual-Scale Normal Decoder performs pixel-wise normal regression at sampled locations.

Results

GeoUniPS results

BibTeX

@inproceedings{kmtam2026geounips,
  title={Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo Under Limited Multi-Illumination Cues},
  author={King-Man Tam and Satoshi Ikehata and Yuta Asano and Zhaoyi An and Rei Kawakami},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}