python machine-learning neural-network image-recognition python-tesseract

Pytesseract works incorrect with handwritten letters

I have to recognize handwritten letters and their coordinates such as on this image.

I tried to do this with pytesseract but It can recognize only printed text and works incorrect with my images. I have no time to write my own neural network and want to use a ready-made solution as pytesseract. I know that it can do this but this code works incorrectly.

import cv2
import pytesseract
import imutils
image = cv2.imread('test/task.jpg')
image = imutils.resize(image, width=700)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(image, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', image)
cv2.imwrite('images/thresh.png', thresh)
cv2.waitKey()

This code returns wrong answer.

ti | ee
ares” * ae
de le lc ld

What am I doing wrong ?

P.S. I converted my image using adaptive threshold and it is looking like this, but the code still working inccorect (now I just call image_to_string() method with well converted image)

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'
image = cv2.imread('output.png')
data = pytesseract.image_to_string(image, lang='eng', config='--psm 6')
print(data)
cv2.waitKey(0)

It returns this

a Oe '
Pee ee
eee ee ee ee
re
eB
STI AT TTT
“Shen if
ae 6
jal ne
yo l
a) Ne es oe
Seaneaeer =
ee es ee
a en ee
ee rt

Solution

I have a suggestion for making the image clear, removing the background.

You can use inRange thresholding.

To use inRange thresholding you need to convert the image to the "hsv" color-space, then you have to set the lower and upper boundaries of the inRange method. The boundary values can be set manually. The result of the inRange method will be the mask of the image, where you can use the mask to remove the background. For example:

After, you can use the tesseract page segmentation modes(psm). Each psm value will give a different output. For example, psm 6 will give the result:

B
JN
A 3 C

If the answer is not the desired output, then you can use other improvement methods. Like other image processing methods, or other methods (EAST Text detector).

If you still have trouble, you can localize the detected text and observe why the text is misinterpreted. For example:

Cropped image	`psm` mode 6 output
	BS
	A
	8
	(_

As we can see with psm mode 6, B and C are misinterpreted. Maybe psm 7 will interpret them correctly, you need to try with other values. If you don't want to try, you can use other deep-learning method like EAST text detector

Code:

import cv2
from numpy import array
import pytesseract
from pytesseract import Output


# Load the image
img = cv2.imread("f4hjh.jpg")

# Convert to the hsv color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Get binary-mask
msk = cv2.inRange(hsv, array([0, 0, 0]), array([179, 255, 80]))
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=1)
thr = 255 - cv2.bitwise_and(dlt, msk)

txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

For detecting and localizing the text in the image:

# OCR
d = pytesseract.image_to_data(thr, config="--psm 6", output_type=Output.DICT)
n_boxes = len(d['level'])

for i in range(n_boxes):

    # Get the localized region
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])

    # Draw rectangle to the detected region
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 255), 5)

    # Crop the image
    crp = thr[y:y+h, x:x+w]

    # OCR
    txt = pytesseract.image_to_string(crp, config="--psm 6")
    print(txt)

    # Display the cropped image
    cv2.imshow("crp", crp)
    cv2.waitKey(0)