I have some images containing only digits, and a semicolon.
Example:
You can see more here: https://imgur.com/a/54dsl6h
They seem pretty clean and straightforward to me, but Tesseract considers them as empty "pages" (Empty page!!
).
I tried both with oem 1 and oem 0 with a character list:
tesseract processed/35.0.png stdout -c tessedit_char_whitelist=0123456789: --oem 0
tesseract processed/35.0.png stdout
What can I do to get Tesseract to recognize the characters better?
Tesseract still gives me pretty bad results overall, but making the text bolder with a simple dilatation algorithm helped a bit.
In the end, since the font is really square, I used a trick, where I defined a bunch of segments for each digits, and depending on which segments intersect, or dont intersect with the digit, I can determine with 99% accuracy which digit it is.