Improving stemming for Arabic information retrieval 论文

2002引用 360

Natural Language Processing TechniquesTopic ModelingSpeech and dialogue systems

Natural Language Processing Techniques Topic Modeling Speech and dialogue systems

作者

摘要

Arabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stemming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. We compared the retrieval effectiveness of our stemmers and of a morphological analyzer on the TREC-2001 data. The best light stemmer was more effective for cross-language retrieval than a morphological stemmer which tried to find the root for each word. A repartitioning process consisting of vowel removal followed by clustering using co-occurrence analysis produced stem classes which were better than no stemming or very light stemming, but still inferior to good light stemming or morphological analysis.

作者查看全部 (3)

Margaret E. Connell

Lisa Ballesteros

Leah S. Larkey

Improving stemming for Arabic information retrieval 论文

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章