Holger Schwenk

Research Scientist

I am a research scientist at Facebook Artificial Intelligence Research, Paris. I received my PhD in computer science from the University Paris 6 in 1996. I then spent one year at the University of Montreal working with Y. Bengio and one year at the International Computer Science Institute in Berkeley. From 1998 to 2007, I held an assistant professor position at the University Paris 11/LIMSI. Prior to joining Facebook in 2015, I was professor of computer science at the University of Le Mans where I led a large group on statistical machine translation.

In 2013, I was awarded senior member of the Institut Universitaire de France.


Natural language processing, machine translation, human-machine interaction and deep neural networks

Latest Publications

Low-Resource Corpus Filtering using Multilingual Sentence Embeddings

Vishrav Chaudhary, Yuqing Tang, Francisco (Paco) Guzman, Holger Schwenk, Philipp Koehn

ACL - August 2, 2019

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Mikel Artetxe, Holger Schwenk

ACL - July 27, 2019

XNLI: Evaluating Cross-lingual Sentence Representations

Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel R. Bowman, Holger Schwenk, Ves Stoyanov

EMNLP 2018 - October 29, 2018

A Corpus for Multilingual Document Classification in Eight Languages

Holger Schwenk, Xian Li

LREC 2018 - May 7, 2018

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Alexis Conneau, Douwe Kiela, Holger Schwenk, LoÏc Barrault, Antoine Bordes

EMNLP 2017 - September 7, 2017

Learning Multilingual Joint Sentence Embeddings with Neural Machine Translation

Holger Schwenk, Matthijs Douze

ACL Workshop on Representation Learning for NLP - July 31, 2017

Downloads & Projects

View all Downloads & Projects

WikiMatrix is a corpus of parallel sentences used in the project outlined in WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia. The goal of this project is to mine for parallel sentences in the textual content of Wikipedia for all possible language pairs.

LASER is a library to calculate multilingual sentence embeddings.