I am using Tesseract OCR for getting an exclusively numeric string in a PDF file. The PDF contains : 66600O3377.pdf but Tesseract recognizes : 66600Q3377.pdf
The input is a TIFF file, the quality is good enough (see the screenshot).
Is there a way to improve the Tesseract accuracy ? I could always change Q for a 0 but I'm afraid of further unexpected mistakes.
This is in Tesseract FAQ:
Run a tesseract command like this to only permit digits in input image:
tesseract imagename outputbase digits