Segmentation-Aware Convolutional Networks using Local Attention Masks

International Conference on Computer Vision (ICCV)

Abstract

We introduce an approach to integrate segmentation information within a convolutional neural network (CNN). This counter-acts the tendency of CNNs to smooth information across regions and increases their spatial precision. To obtain segmentation information, we set up a CNN to provide an embedding space where region co-membership can be estimated based on Euclidean distance. We use these embeddings to compute a local attention mask relative to every neuron position. We incorporate such masks in CNNs and replace the convolution operation with a “segmentation-aware” variant that allows a neuron to selectively attend to inputs coming from its own region. We call the resulting network a segmentation-aware CNN because it adapts its filters at each image point according to local segmentation cues, while at the same time remaining fully-convolutional. We demonstrate the merit of our method on two widely different dense prediction tasks, that involve classification (semantic segmentation) and regression (optical flow). Our results show that in semantic segmentation we can replace DenseCRF inference with a cascade of segmentation-aware filters, and in optical flow we obtain clearly sharper responses than the ones obtained with comparable networks that do not use segmentation. In both cases segmentation-aware convolution yields systematic improvements over strong baselines.

Featured Publications