Are we Estimating or Guesstimating Translation Quality?

Association for Computational Linguistics (ACL)


Recent advances in pre-trained multilingual language models lead to state-of-the-art results on the task of quality estimation (QE) for machine translation. A carefully engineered ensemble of such models dominated the QE shared task at WMT 2019. Our in-depth analysis, however, shows that the success of using pre-trained language models for QE is overestimated due to three issues we observed in current QE datasets: (i) The distributions of quality scores are imbalanced and skewed towards good quality scores; (ii) QE models can perform well on these datasets without even ingesting source or translated sentences; (iii) They contain statistical artifacts that correlate well with human-annotated QE labels. Our findings suggest that though QE models might capture fluency of translated sentences and complexity of source sentences, they cannot model adequacy of translations effectively.

Related Publications

All Publications

EMNLP - November 10, 2021

Cross-Policy Compliance Detection via Question Answering

Marzieh Saeidi, Majid Yazdani, Andreas Vlachos

EMNLP - November 7, 2021

Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia

EMNLP - October 31, 2021

Evaluation Paradigms in Question Answering

Pedro Rodriguez, Jordan Boyd-Graber

NAACL - June 6, 2021

Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking

Zhaojiang Lin, Bing Liu, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Andrea Madotto, Eunjoon Cho, Rajen Subba

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy