Search code examples
ocrtesseract

Tesseract OCR can't recognize basic alphanumeric codes


Tesseract seems to have problems recognizing basic alphanumeric codes. I've tried upscaling the image, changing to a monospace font and turning off the dictionary with no improvement in OCR quality.

The image below is recognized as the following:

i3DOIIH_My ActivitiesJ

MmRSes_My Accounm DBYCAe_My Submissions1

Hrti6_My Renewam

enter image description here

As you can see the recognized characters are completely off.


Solution

  • Your original image size is 1508 x 1092 pixels with 4 lines plus vertical spacing, it seems too big.

    After reduced the image to 503 x 364 pixels, around 76 pixels height for the characters. enter image description here

    Tesseract gives 100% OCR result on the text. enter image description here

    The font size and background color do affect the OCR result. The best result would be obtained from text in black-in-white. Otherwise, image preprocessing is likely required.

    Hope this help.