Extracting company names from text 论文

2002引用 259
Natural Language Processing TechniquesTopic ModelingAdvanced Text Analysis Techniques

摘要

A detailed description is given of an implemented algorithm that extracts company names automatically from financial news. Extracting company names from text is one problem; recognizing subsequent references to a company is another. The author addresses both problems in an implemented, well-tested module that operates as a detachable process from a set of natural language processing tools. She implements a good algorithm by combining heuristics, exception lists and extensive corpus analysis. The algorithm generates the most likely variations that those names may go by, for use in subsequent retrieval. Tested on over one million words of naturally occurring financial news, the system has extracted thousands of company names with over 95% accuracy (precision) compared to a human, and succeeded in extracting 25% more companies than were indexed by a human.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

相关事件

暂无数据

相关文章

暂无数据