Holger Schwenk

Research Scientist

I am a research scientist at Facebook Artificial Intelligence Research, Paris. I received my PhD in computer science from the University Paris 6 in 1996. I then spent one year at the University of Montreal working with Y. Bengio and one year at the International Computer Science Institute in Berkeley. From 1998 to 2007, I held an assistant professor position at the University Paris 11/LIMSI. Prior to joining Facebook in 2015, I was professor of computer science at the University of Le Mans where I led a large group on statistical machine translation.

In 2013, I was awarded senior member of the Institut Universitaire de France.


Natural language processing, machine translation, human-machine interaction and deep neural networks

Latest Publications

IWLST - August 2, 2021

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal

ACL - May 2, 2021

MLQA: Evaluating Cross-lingual Extractive Question Answering

Patrick Lewis, Barlas Oğuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk

EACL - April 22, 2021

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, Francisco Guzmán

ACL - August 2, 2019

Low-Resource Corpus Filtering using Multilingual Sentence Embeddings

Vishrav Chaudhary, Yuqing Tang, Francisco (Paco) Guzman, Holger Schwenk, Philipp Koehn

ACL - July 27, 2019

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Mikel Artetxe, Holger Schwenk

EMNLP 2018 - October 29, 2018

XNLI: Evaluating Cross-lingual Sentence Representations

Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel R. Bowman, Holger Schwenk, Ves Stoyanov

LREC 2018 - May 7, 2018

A Corpus for Multilingual Document Classification in Eight Languages

Holger Schwenk, Xian Li

EMNLP 2017 - September 7, 2017

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Alexis Conneau, Douwe Kiela, Holger Schwenk, LoÏc Barrault, Antoine Bordes

ACL Workshop on Representation Learning for NLP - July 31, 2017

Learning Multilingual Joint Sentence Embeddings with Neural Machine Translation

Holger Schwenk, Matthijs Douze

Downloads & Projects

View all Downloads & Projects

WikiMatrix is a corpus of parallel sentences used in the project outlined in WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia. The goal of this project is to mine for parallel sentences in the textual content of Wikipedia for all possible language pairs.

LASER is a library to calculate multilingual sentence embeddings.