CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes 论文
摘要
The CoNLL-2012 shared task involved pre-dicting coreference in English, Chinese, and Arabic, using the final version, v5.0, of the OntoNotes corpus. It was a follow-on to the English-only task organized in 2011. Un-til the creation of the OntoNotes corpus, re-sources in this sub-field of language process-ing were limited to noun phrase coreference, often on a restricted set of entities, such as ACE entities. OntoNotes provides a large-scale corpus of general anaphoric coreference not restricted to noun phrases or to a spec-ified set of entity types, and covers multi-ple languages. OntoNotes also provides ad-ditional layers of integrated annotation, cap-turing additional shallow semantic structure. This paper describes the OntoNotes annota-tion (coreference and other layers) and then describes the parameters of the shared task in-cluding the format, pre-processing informa-tion, evaluation criteria, and presents and dis-cusses the results achieved by the participat-ing systems. The task of coreference has had a complex evaluation history. Potentially many evaluation conditions, have, in the past, made it difficult to judge the improvement in new algorithms over previously reported results. Having a standard test set and evaluation pa-rameters, all based on a resource that provides multiple integrated annotation layers (parses, semantic roles, word senses, named entities and coreference) and in multiple languages could support joint models, and should help ground and energize ongoing research in the task of entity and event coreference. 1