Extracting company names from text 论文

2002引用 259

Natural Language Processing TechniquesTopic ModelingAdvanced Text Analysis Techniques

Natural Language Processing Techniques Topic Modeling Advanced Text Analysis Techniques

作者

摘要

A detailed description is given of an implemented algorithm that extracts company names automatically from financial news. Extracting company names from text is one problem; recognizing subsequent references to a company is another. The author addresses both problems in an implemented, well-tested module that operates as a detachable process from a set of natural language processing tools. She implements a good algorithm by combining heuristics, exception lists and extensive corpus analysis. The algorithm generates the most likely variations that those names may go by, for use in subsequent retrieval. Tested on over one million words of naturally occurring financial news, the system has extracted thousands of company names with over 95% accuracy (precision) compared to a human, and succeeded in extracting 25% more companies than were indexed by a human.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>

作者查看全部 (1)

Lisa F. Rau

Extracting company names from text 论文

摘要

作者查看全部 (1)

相关技术查看全部 (2)

相关事件

相关文章