Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

International Conference on Learning Representations (ICLR)


The use of deep pre-trained transformers has led to remarkable progress in a number of applications (Devlin et al., 2019). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often performs better, but is too slow for practical use. In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. We perform a detailed comparison of all three approaches, including what pre-training and fine-tuning strategies work best. We show our models achieve state-of-the-art results on four tasks; that Poly-encoders are faster than Cross-encoders and more accurate than Bi-encoders; and that the best results are obtained by pre-training on large datasets similar to the downstream tasks.

Related Publications

All Publications

EACL - April 18, 2021

Co-evolution of language and agents in referential games

Gautier Dagan, Dieuwke Hupkes, Elia Bruni

PPSN - September 2, 2020

Variance Reduction for Better Sampling in Continuous Domains

Laurent Meunier, Carola Doerr, Jeremy Rapin, Olivier Teytaud

ACL - May 2, 2021

MLQA: Evaluating Cross-lingual Extractive Question Answering

Patrick Lewis, Barlas O─čuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy