Publication

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

Conference on Machine Translation (WMT) at ACL


Abstract

Following the WMT 2018 Shared Task on Parallel Corpus Filtering (Koehn et al., 2018), we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2% and 10% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali– English and Sinhala–English. Eleven participants from companies, national research labs, and universities participated in this task.

Related Publications

All Publications

EMNLP - October 31, 2021

Evaluation Paradigms in Question Answering

Pedro Rodriguez, Jordan Boyd-Graber

NAACL - June 6, 2021

Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking

Zhaojiang Lin, Bing Liu, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Andrea Madotto, Eunjoon Cho, Rajen Subba

EMNLP - November 16, 2020

Abusive Language Detection using Syntactic Dependency Graphs

Kanika Narang, Chris Brew

Interspeech - August 31, 2021

slimIPL: Language-Model-Free Iterative Pseudo-Labeling

Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy