Search code examples
pythonopencvtesseractpython-tesseract

Why is Tesseract unable to detect the single digit in that image?


I have this image, and I'm trying to read it with Tesseract:

enter image description here

My code is like that:

pytesseract.image_to_string(im)

But, what I get is only LOW: 56. So, Tesseract is unable to read the 1 in the first line. I've tried to specify also a whitelist of only digits like

pytesseract.image_to_string(im, config="tessedit_char_whitelist=0123456789.")

and to process the image with an erosion but nothing works. Any suggestions?


Solution

  • Improving the quality of the output is your "holy scripture" when working with Tesseract. Especially, the page segmentation method should always be explicitly set. Here (as most of the times), I'd opt for --psm 6:

    Assume a single uniform block of text.

    Even without further preprocessing of your image, you already get the desired result:

    import cv2
    import pytesseract
    
    image = cv2.imread('gBrcd.png')
    text = pytesseract.image_to_string(image, config='--psm 6')
    print(text.replace('\f', ''))
    # 1
    # LOW: 56
    
    ----------------------------------------
    System information
    ----------------------------------------
    Platform:      Windows-10-10.0.19041-SP0
    Python:        3.9.1
    PyCharm:       2021.1.1
    OpenCV:        4.5.2
    pytesseract:   5.0.0-alpha.20201127
    ----------------------------------------