python opencv tesseract python-tesseract

Why is Tesseract unable to detect the single digit in that image?

I have this image, and I'm trying to read it with Tesseract:

My code is like that:

pytesseract.image_to_string(im)

But, what I get is only LOW: 56. So, Tesseract is unable to read the 1 in the first line. I've tried to specify also a whitelist of only digits like

pytesseract.image_to_string(im, config="tessedit_char_whitelist=0123456789.")

and to process the image with an erosion but nothing works. Any suggestions?

Solution

Improving the quality of the output is your "holy scripture" when working with Tesseract. Especially, the page segmentation method should always be explicitly set. Here (as most of the times), I'd opt for --psm 6:

Assume a single uniform block of text.

Even without further preprocessing of your image, you already get the desired result:

import cv2
import pytesseract

image = cv2.imread('gBrcd.png')
text = pytesseract.image_to_string(image, config='--psm 6')
print(text.replace('\f', ''))
# 1
# LOW: 56

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------