tesseract not accurate at all, even with config

My code ⠀

for index, img in enumerate(data): # data is list of base64 decoded strings
    b64 = base64.b64decode(bytes(img[22:], encoding='utf-8'))
    raw = BytesIO(b64)
    im ='LA')
    pixels = im.load()
    width, height = im.size
    for x in range(width):
        for y in range(height):
            if pixels[x, y][0] > 100: pixels[x, y] = (255, 255)
            else: pixels[x, y] = (0, 255)
    print(pytesseract.image_to_string(im, config='tessedit_char_whitelist=1234567890plus?'))

My Image:
Te Ys
What I can do to make this better, I tried to use every psm from 0 to 13 and -c flag in config key


  • You need to invert your image. Then it will be accurate.

    import pytesseract
    import cv2
    pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
    image = cv2.imread('addition.png', 0)
    image = 255 - image
    for psm in range(6,13+1):
        config = '--oem 3 --psm %d' % psm
        txt = pytesseract.image_to_string(image, config = config, lang='eng')
        print('psm ', psm, ':',txt)

    which gives good results for all psm values

    psm  6 : 18 plus 16?
    psm  7 : 18 plus 16?
    psm  8 : 18 plus 16?
    psm  9 : 18 plus 16?
    psm  10 : 18 plus 16?
    psm  11 : 18 plus 16?
    psm  12 : 18 plus 16?
    psm  13 : 18 plus 16?