43
IRUS Total
Downloads
  Altmetric

Taking MT evaluation metrics to extremes: beyond correlation with human judgments

File Description SizeFormat 
coli_a_00356.pdfPublished version1.14 MBAdobe PDFView/Open
Title: Taking MT evaluation metrics to extremes: beyond correlation with human judgments
Authors: Fomicheva, M
Specia, L
Item Type: Journal Article
Abstract: Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new metrics devised every year. Evaluation metrics are generally benchmarked against manual assessment of translation quality, with performance measured in terms of overall correlation with human scores. Much work has been dedicated to the improvement of evaluation metrics to achieve a higher correlation with human judgments. However, little insight has been provided regarding the weaknesses and strengths of existing approaches and their behavior in different settings. In this work we conduct a broad meta-evaluation study of the performance of a wide range of evaluation metrics focusing on three major aspects. First, we analyze the performance of the metrics when faced with different levels of translation quality, proposing a local dependency measure as an alternative to the standard, global correlation coefficient. We show that metric performance varies significantly across different levels of MT quality: Metrics perform poorly when faced with low-quality translations and are not able to capture nuanced quality distinctions. Interestingly, we show that evaluating low-quality translations is also more challenging for humans. Second, we show that metrics are more reliable when evaluating neural MT than the traditional statistical MT systems. Finally, we show that the difference in the evaluation accuracy for different metrics is maintained even if the gold standard scores are based on different criteria.
Issue Date: 1-Sep-2019
Date of Acceptance: 12-Jun-2019
URI: http://hdl.handle.net/10044/1/79480
DOI: 10.1162/coli_a_00356
ISSN: 0891-2017
Publisher: MIT Press
Start Page: 515
End Page: 558
Journal / Book Title: Computational Linguistics
Volume: 45
Issue: 3
Copyright Statement: © 2019 Association for Computational Linguistics Published under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license
Keywords: Science & Technology
Social Sciences
Technology
Computer Science, Artificial Intelligence
Computer Science, Interdisciplinary Applications
Linguistics
Language & Linguistics
Computer Science
LOCAL GAUSSIAN CORRELATION
INTERDEPENDENCE
DEPENDENCE
CONTAGION
Science & Technology
Social Sciences
Technology
Computer Science, Artificial Intelligence
Computer Science, Interdisciplinary Applications
Linguistics
Language & Linguistics
Computer Science
LOCAL GAUSSIAN CORRELATION
INTERDEPENDENCE
DEPENDENCE
CONTAGION
Artificial Intelligence & Image Processing
0801 Artificial Intelligence and Image Processing
1702 Cognitive Sciences
2004 Linguistics
Publication Status: Published
Online Publication Date: 2019-09
Appears in Collections:Computing