Francisco (Paco) Guzman

Research Scientist

I work at the Language and Translation Technology (LATTE) group in Facebook’s Applied Machine Learning (AML) division. I work in the field of machine translation. My research has been published in top-tier NLP venues like ACL, EMNLP. I participated in several machine translation competitions, obtaining top rankings for Arabic-English and Spanish-English language pairs. I have made contributions to machine translation evaluation using discourse information, winning the WMT2014 metrics evaluation campaign. I obtained my PhD from the ITESM in Mexico, was a visiting scholar at the LTI-CMU from 2008-2009, and participated in DARPA’s GALE evaluation program. I was a post-doc and scientist at Qatar Computing Research Institute in Qatar in 2012-2016.


Machine translation and natural language processing

Related Links

Google Scholar

Latest Publications

EMNLP - November 16, 2020

CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

Ahmed El-Kishky, Vishrav Chaudhary, Francisco (Paco) Guzman, Philipp Koehn

TACL - August 31, 2020

Unsupervised Quality Estimation for Neural Machine Translation

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco (Paco) Guzman, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

ACL - July 6, 2020

Are we Estimating or Guesstimating Translation Quality?

Shuo Sun, Francisco (Paco) Guzman, Lucia Specia

EMNLP - October 31, 2019

The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English

Francisco (Paco) Guzman, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

ACL - August 2, 2019

Low-Resource Corpus Filtering using Multilingual Sentence Embeddings

Vishrav Chaudhary, Yuqing Tang, Francisco (Paco) Guzman, Holger Schwenk, Philipp Koehn

WMT at ACL - August 2, 2019

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

Philipp Koehn, Francisco (Paco) Guzman, Vishrav Chaudhary, Juan Pino

CHI - May 4, 2019

Design and Evaluation of a Social Media Writing Support Tool for People with Dyslexia

Shaomei Wu, Lindsay Reynolds, Xian Li, Francisco (Paco) Guzman

Downloads & Projects

View all Downloads & Projects

A public benchmark for low resource machine translation FLoRes is a dataset that can be used to reproduce the experiments…

WikiMatrix is a corpus of parallel sentences used in the project outlined in WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia. The goal of this project is to mine for parallel sentences in the textual content of Wikipedia for all possible language pairs.