Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming 论文
摘要
Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most pre-vious work simplifies sentences using hand-crafted rules aimed at splitting long sentences, or substitutes difficult words using a prede-fined dictionary. This paper presents a data-driven model based on quasi-synchronous grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. We describe how such a grammar can be induced from Wikipedia and propose an integer linear programming model for selecting the most appropriate simplifica-tion from the space of possible rewrites gen-erated by the grammar. We show experimen-tally that our method creates simplifications that significantly reduce the reading difficulty of the input, while maintaining grammaticality and preserving its meaning. 1