Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming 论文

2011Edinburgh Research Explorer (University of Edinburgh)引用 218
Text Readability and SimplificationNatural Language Processing TechniquesTopic Modeling

摘要

Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most pre-vious work simplifies sentences using hand-crafted rules aimed at splitting long sentences, or substitutes difficult words using a prede-fined dictionary. This paper presents a data-driven model based on quasi-synchronous grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. We describe how such a grammar can be induced from Wikipedia and propose an integer linear programming model for selecting the most appropriate simplifica-tion from the space of possible rewrites gen-erated by the grammar. We show experimen-tally that our method creates simplifications that significantly reduce the reading difficulty of the input, while maintaining grammaticality and preserving its meaning. 1