Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats

Mingyang Xie^1*, Haoming Cai^1*, Sachin Shah¹, Yiran Xu¹, Brandon Y. Feng²,

Jia-Bin Huang¹, Christopher Metzler¹

¹University of Maryland, ²MIT

^*Equal Contribution

ECCV 2024

Paper Supp Code Poster

Demo of 3D Reflection Separation

Composite 3D Scene

NeRFReN Transmission

NeRFReN Reflection

Ours Transmission

Ours Reflection

By leveraging the cues from camera flash, our proposed method significantly outperforms NeRFReN.

Proposed Capturing Mode

The user first captures a set of multi-view images with the camera flash on, and then captures another set of images with the flash off (no paired capture is requried). Our algorithm leverages these flash/no-flash images for 3D reflection separation.

Motivation

Flash/No-Flash For Reflection Removal. The difference between paired flash and no-flash images is equivalent to taking a photo with flash in a dark environment, which gives us a reflection-free image (top). This is because flash increases the transmission brightness, but not the reflection brightness. Notice pairs must be tightly aligned for this method to work. Even tiny vibrations such as pressing the shutter button even when using a tripod produce artifacts (bottom).

Our Intuition

We create ``pseudo-pair'' of flash/no-flash images by novel view synthesis. During the data capture stage, we collect flash/no-flash images from different views (no paired capture is requried). For instance, if we have captured a no-flash image at View 2, we can learn a 3D representation of the captured flash images at other views, and then synthesize a novel view of the flash image at View 2. As such, we have created a pseudo-pair of flash and no-flash images at View 2. The key idea is that, by taking the difference between the pseudo-pair, we get the transmission component of View 2 that is free of reflection.

Proposed Method

We jointly optimize 4 sets of 3D Gaussian Splats, including the transmitted scene taken with flash $T_{F}$ , the transmitted scene taken with no flash $T_{N}$ , the reflected scene $R$ , and the reflection ratio map Beta $β$ . Based on the Flash/No-flash idea, $R$ and $β$ are shared between the flash image and the no-flash image, and we encourage a linear relationship between $T_{F}$ and $T_{N}$ .

Additional Demo