Sentence Extraction as a Classification Task 论文

1997引用 217
Natural Language Processing TechniquesTopic ModelingSentiment Analysis and Opinion Mining

摘要

A useful first step in document summarisation is the selection of a small number of `meaningful' sentences from a larger text. Kupiec et al. (1995) describe this as a classification task: on the basis of a corpus of technical papers with summaries written by professional abstractors, their system identifies those sentences in the text which also occur in the summary, and then acquires a model of the `abstract-worthiness' of a sentence as a combination of a limited number of properties of that sentence. We report on a replication of this experiment with different data: summaries for our documents were not written by professional abstractors, but by the authors themselves. This produced fewer alignable sentences to train on. We use alternative `meaningful' sentences (selected by a human judge) as training and evaluation material, because this has advantages for the subsequent automatic generation of more flexible abstracts. We quantitatively compare the two different strategies for train...