Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction 论文
摘要
Two new, cost-effective thresholding algorithms for use in extracting binary images of characters from machine- or hand-printed documents are described. The creation of a binary representation from an analog image requires such algorithms to determine whether a point is converted into a binary one because it falls within a character stroke or a binary zero because it does not. This thresholding is a critical step in Optical Character Recognition (OCR). It is also essential for other Character Image Extraction (CIE) applications, such as the processing of machine-printed or handwritten characters from carbon copy forms or bank checks, where smudges and scenic backgrounds, for example, may have to be suppressed. The first algorithm, a nonlinear, adaptive procedure, is implemented with a minimum of hardware and is intended for many CIE applications. The second is a more aggressive approach directed toward specialized, high-volume applications which justify extra complexity.