A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text 论文

2005Literary and Linguistic Computing引用 222
Authorship Attribution and ProfilingNatural Language Processing TechniquesTopic Modeling

摘要

In this paper we describe an approach to the identification of "translationese" based on monolingual comparable corpora and machine learning techniques for text categorization. The paper
\nreports on experiments in which support vector machines (SVMs) are employed to recognize translated text in a corpus of Italian articles from the geopolitical domain. An ensemble of SVMs reaches
\n86.7% accuracy with 89.3% precision and 83.3% recall on this
\ntask. A preliminary analysis of the features used by the SVMs
\nsuggest that the distribution of function words and morphosyntactic
\ncategories in general, and personal pronouns and adverbs in
\nparticular are among the cues used by the SVMs to perform the
\ndiscrimination task. A follow-up experiment shows that the
\nperformance attained by SVMs is well above the average performance of 10 human subjects, including 5 professional translators, on the same task. Our results offer solid evidence supporting the translationese hypothesis, and our method seems to have promising applications in translation studies and more in general in quantitative style analysis. Implications for the machine
\nlearning/text categorization community are equally important, because this is a novel application, and especially because we
\nprovide explicit evidence that a relatively knowledge-poor machine
\nlearning algorithm can outperform human beings in a text
\nclassification task.

相关事件

暂无数据

相关文章

暂无数据