
ReLi3D — Relightable multi-view 3D reconstruction with disentangled illumination and spatially varying PBR materials.
ReLi3D is a unified end-to-end pipeline that simultaneously reconstructs complete 3D geometry, spatially varying physically based materials, and environment illumination from sparse multi-view images in under one second.
The key idea is to treat multi-view fusion as the main mechanism for material-lighting disentanglement. Instead of a single-view ill-posed decomposition, cross-view constraints narrow feasible solutions and improve relightable asset quality.
ReLi3D uses a shared multi-view cross-conditioning transformer followed by two prediction paths: (1) a geometry and appearance path that predicts mesh plus spatially varying BRDF, and (2) an illumination path that predicts a coherent HDR environment in RENI++ latent space.

Cross-view Fusion — A variable number of masked input views is encoded with DINOv2 and camera-aware modulation. One hero view anchors the query stream while additional views provide cross-attention memory, producing triplane features that are consistent across viewpoints.
Two-path Prediction — The geometry and appearance path predicts density, mesh structure, and spatially varying albedo, roughness, metallic, and bump normals from a shared triplane embedding. In parallel, the illumination path fuses mask-aware tokens with object tokens to infer RENI++ latents and recover HDR environment lighting.
Disentangled Training — A differentiable Monte Carlo renderer with Multiple Importance Sampling ties both paths together. This enforces physically meaningful material-lighting separation and allows mixed-domain training across synthetic PBR-supervised data, synthetic RGB-only data, and real-world captures.
The ICLR paper focuses on disentanglement quality: ReLi3D predicts fully spatially varying PBR channels and coherent illumination, improving significantly with additional views.

PBR Prediction — On Polyhaven + Blender Shiny evaluation, ReLi3D reports 25.00 dB basecolor PSNR, 22.69 dB roughness PSNR, and 32.73 dB metallic PSNR in the single-view setting, with further gains in multi-view mode.

Illumination Estimation — The model recovers plausible sky color and dominant light direction even from a single view, and improves source localization when more views or background evidence are available. Compared to prior RENI++ usage, the multi-view illumination path produces sharper and more coherent environment predictions.

ReLi3D achieves strong reconstruction quality while staying interactive (about 0.3s per object). The paper evaluates both synthetic and real-world datasets and shows consistent gains from multi-view conditioning.

Overall Reconstruction — In the combined GSO + Stanford ORB evaluation, ReLi3D reaches 19.57 PSNR and 0.902 SSIM in the single-view setting, and improves further with more views. In UCO3D, it reports 15.28 PSNR single-view and up to 15.73 PSNR in multi-view settings.
Cross-domain Real-world Performance — The model is trained on 174k objects across synthetic PBR, synthetic RGB-only, and real-world UCO3D captures. On Stanford ORB, ReLi3D outperforms baselines across geometry, image quality, and basecolor metrics, and continues to improve from 1-view to 16-view input.

Limitations include harder decomposition in scenes with multiple strong artificial light sources and resolution limits from triplane representation. The paper concludes that the proposed disentanglement framework enables fast generation of complete relightable assets and forms a practical foundation for future material-aware 3D reconstruction systems.
Open Positions
Interested in persuing a PhD in computer graphics?
Never miss an update
Join us on Twitter / X for the latest updates of our research group and more.
Recent Work
This page reflects the newer ICLR version of ReLi3D. Code and pretrained weights are planned for release on GitHub and Hugging Face.