Mobile photography has made great strides in recent years. However, low light imaging remains a challenge. Long exposures can improve signal-to-noise ratio (SNR) but undesirable motion blur can occur when capturing dynamic scenes. Consequently, imaging pipelines often rely on computational photography to improve SNR by fusing multiple short exposures. Recent deep network-based methods have been shown to generate visually pleasing results by fusing these exposures in a sophisticated manner, but often at a higher computational cost.
We propose an end-to-end trainable burst denoising pipeline which jointly captures high-resolution and high-frequency deep features derived from wavelet transforms. In our model, precious local details are preserved in high-frequency sub-band features to enhance the final perceptual quality, while the low-frequency sub-band features carry structural information for faithful reconstruction and final objective quality. The model is designed to accommodate variable-length burst captures via temporal feature shifting while incurring only marginal computational overhead, and further trained with a realistic noise model for the generalization to real environments. Using these techniques, our method attains state-of-the-art performance on perceptual quality, while being an order of magnitude faster.