Simple Semi-supervised Dependency Parsing 论文

2008RECERCAT (Consorci de Serveis Universitaris de Catalunya)引用 444
Natural Language Processing TechniquesTopic ModelingText Readability and Simplification

详细信息

发表期刊/会议
RECERCAT (Consorci de Serveis Universitaris de Catalunya)
发表日期
2008-06-01
发表年份
2008

关键词

Natural Language Processing TechniquesTopic ModelingText Readability and Simplification

摘要

We present a simple and effective semisupervised method for training dependency parsers. We focus on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the effectiveness of the approach in a series of dependency parsing experiments on the Penn Treebank and Prague Dependency Treebank, and we show that the cluster-based features yield substantial gains in performance across a wide range of conditions. For example, in the case of English unlabeled second-order parsing, we improve from a baseline accuracy of 92:02% to 93:16%, and in the case of Czech unlabeled second-order parsing, we improve from a baseline accuracy of 86:13% to 87:13%. In addition, we demonstrate that our method also improves performance when small amounts of training data are available, and can roughly halve the amount of supervised data required to reach a desired level of performance.