Search code examples
pythonimageimage-processingocrpython-tesseract

Pytesseract doesnt recognize simple text in image


I want to recognize a image like this:

enter image description here

I am using the following config:

config="--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ,."

but when I try to convert that, I get the following:

1581

1

W

I think that the image shows really clearly what is written and think that there is a problem with pytesseract. Can you help?


Solution

  • Preprocessing the image to obtain a binary image before performing OCR seems to work. You could also try to resize the image so that more details would be seen

    enter image description here

    Results

    158.1
    1
    IT
    
    import cv2
    import pytesseract
    
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    
    # Grayscale and Otsu's threshold
    image = cv2.imread('1.png')
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    
    # Perform text extraction
    data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
    print(data)
    
    cv2.imshow('thresh', thresh)
    cv2.waitKey()