Search code examples
pythonmachine-learningneural-networkimage-recognitionpython-tesseract

Pytesseract works incorrect with handwritten letters


I have to recognize handwritten letters and their coordinates such as on this image. Image to recognize

I tried to do this with pytesseract but It can recognize only printed text and works incorrect with my images. I have no time to write my own neural network and want to use a ready-made solution as pytesseract. I know that it can do this but this code works incorrectly.

import cv2
import pytesseract
import imutils
image = cv2.imread('test/task.jpg')
image = imutils.resize(image, width=700)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(image, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', image)
cv2.imwrite('images/thresh.png', thresh)
cv2.waitKey()

This code returns wrong answer.

ti | ee
ares” * ae
de le lc ld

What am I doing wrong ?

P.S. I converted my image using adaptive threshold and it is looking like this, but the code still working inccorect (now I just call image_to_string() method with well converted image)

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'
image = cv2.imread('output.png')
data = pytesseract.image_to_string(image, lang='eng', config='--psm 6')
print(data)
cv2.waitKey(0)

The threshholded photo It returns this

a Oe '
Pee ee
eee ee ee ee
re
eB
STI AT TTT
“Shen if
ae 6
jal ne
yo l
a) Ne es oe
Seaneaeer =
ee es ee
a en ee
ee rt

Solution

  • I have a suggestion for making the image clear, removing the background.

    You can use inRange thresholding.

    To use inRange thresholding you need to convert the image to the "hsv" color-space, then you have to set the lower and upper boundaries of the inRange method. The boundary values can be set manually. The result of the inRange method will be the mask of the image, where you can use the mask to remove the background. For example:

    enter image description here

    After, you can use the tesseract page segmentation modes(psm). Each psm value will give a different output. For example, psm 6 will give the result:

    B
    JN
    A 3 C
    

    If the answer is not the desired output, then you can use other improvement methods. Like other image processing methods, or other methods (EAST Text detector).

    If you still have trouble, you can localize the detected text and observe why the text is misinterpreted. For example:

    Cropped image psm mode 6 output
    enter image description here BS
    enter image description here A
    enter image description here 8
    enter image description here (_

    As we can see with psm mode 6, B and C are misinterpreted. Maybe psm 7 will interpret them correctly, you need to try with other values. If you don't want to try, you can use other deep-learning method like EAST text detector

    Code:

    import cv2
    from numpy import array
    import pytesseract
    from pytesseract import Output
    
    
    # Load the image
    img = cv2.imread("f4hjh.jpg")
    
    # Convert to the hsv color-space
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    
    # Get binary-mask
    msk = cv2.inRange(hsv, array([0, 0, 0]), array([179, 255, 80]))
    krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
    dlt = cv2.dilate(msk, krn, iterations=1)
    thr = 255 - cv2.bitwise_and(dlt, msk)
    
    txt = pytesseract.image_to_string(thr, config="--psm 6")
    print(txt)
    

    For detecting and localizing the text in the image:

    # OCR
    d = pytesseract.image_to_data(thr, config="--psm 6", output_type=Output.DICT)
    n_boxes = len(d['level'])
    
    for i in range(n_boxes):
    
        # Get the localized region
        (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    
        # Draw rectangle to the detected region
        cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 255), 5)
    
        # Crop the image
        crp = thr[y:y+h, x:x+w]
    
        # OCR
        txt = pytesseract.image_to_string(crp, config="--psm 6")
        print(txt)
    
        # Display the cropped image
        cv2.imshow("crp", crp)
        cv2.waitKey(0)