Search code examples
tesseract

Is there a way to use tesseract for single digit numbers?


TL;DR It appears that tesseract cannot recognize images consisting of a single digit. Is there a workaround/reason for this?

I am using (the digits only version of) tesseract to automate inputting invoices to the system. However, I noticed that tesseract seems to be unable to recognize single digit numbers such as the following:

The raw scan after crop is:

enter image description here

After I did some image enhancing:

enter image description here

It works fine if it has at least two digits:

enter image description here enter image description here

I've tested on a couple of other figures:

Not working: enter image description here, enter image description here, enter image description here

Working: enter image description here, enter image description here, enter image description here

If it helps, for my purpose all inputs to tesseract has been cropped and rotated like above. I am using pyocr as a bridge between my project and tesseract.


Solution

  • Individual digits are handled the same way as other characters, so changing the page segmentation mode should help to pick up the digits correctly.

    See also: Tesseract does not recognize single characters