October 10, 2016

Polysemous Codes

European Conference on Computer Vision 2016 (ECCV)

By: Matthijs Douze, Hervé Jégou, Florent Perronnin

Abstract

This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance. Their design is inspired by algorithms formerly introduced in the 90’s to construct channel-optimized vector quantizers. At search time, this dual interpretation accelerates the search. Most of the indexed vectors are filtered out with Hamming distance, letting only a fraction of the vectors to be ranked with an asymmetric distance estimator. The method is complementary with a coarse partitioning of the feature space such as the inverted multi-index. This is shown by our experiments performed on several benchmarks. In particular, our approach outperforms the state of the art by a large margin on the BIGANN dataset comprising one billion vectors. Last but not least, our approach allows the approximate computation of the kNN graph associated with the Yahoo Flickr Creative Commons 100M, described by CNN image descriptors, in less than 8 hours on a single machine.