Search code examples
pythonpython-3.xocrtesseractpython-tesseract

How to properly recognize the number in this kind of images?


I'm trying to make a script that can identify the number in a picture, more precisely pictures REALLY similar to this one:

This goes from 50 to 1, but I'm having some problems reading the number present in there using pytesseract. Here's the code I'm using to read it:

im = Image.open(filename)
text = image_to_string(im)

All results I get are like this:

enter image description here

What can I do to improve the readings?


Solution

  • Improving the quality of the output is your "holy scripture" when working with Tesseract. Before binarization, you could first try to grayscale your image:

    from PIL import Image
    import pytesseract
    
    im = Image.open('G9hvi.png').convert('L')
    text = pytesseract.image_to_string(im)
    print(text.replace('\f', ''))
    # 50
    

    Boom! – without any further pre-processing you already get the correct result.

    ----------------------------------------
    System information
    ----------------------------------------
    Platform:      Windows-10-10.0.19041-SP0
    Python:        3.9.1
    PyCharm:       2021.1.2
    Pillow:        8.2.0
    pytesseract:   5.0.0-alpha.20201127
    ----------------------------------------