Search code examples
pythonopencvimage-processingocrpython-tesseract

Preprocess images using OpenCV for pytesseract OCR


I want to use OCR (pytesseract) to recognize the text located in images like these:

enter image description here enter image description here enter image description here

I have thousands of these arrows. Until now the procedure is as follows: I first resize the image (for another process). Then I crop the image to get rid of the most part of the arrow. Next I draw a white rectangle as a frame to remove further noise but still have distance between text and image borders for better text recognition. I resize the image again to ensure a height of capital letters to ~30 px (https://groups.google.com/forum/#!msg/tesseract-ocr/Wdh_JJwnw94/24JHDYQbBQAJ). Finally I binarize the image with a threshold of 150.

Full code:

import cv2

image_file = '001.jpg'

# load the input image and grab the image dimensions
image = cv2.imread(image_file, cv2.IMREAD_GRAYSCALE)
(h_1, w_1) = image.shape[:2]

# resize the image and grab the new image dimensions
image = cv2.resize(image, (int(w_1*320/h_1), 320))
(h_1, w_1) = image.shape

# crop image
image_2 = image[70:h_1-70, 20:w_1-20]

# get image_2 height, width
(h_2, w_2) = image_2.shape

# draw white rectangle as a frame around the number -> remove noise
cv2.rectangle(image_2, (0, 0), (w_2, h_2), (255, 255, 255), 40)

# resize image, that capital letters are ~ 30 px in height
image_2 = cv2.resize(image_2, (int(w_2*50/h_2), 50))

# image binarization
ret, image_2 = cv2.threshold(image_2, 150, 255, cv2.THRESH_BINARY)

# save image to file
cv2.imwrite('processed_' + image_file, image_2)

# tesseract part can be commented out
import pytesseract
config_7 = ("-c tessedit_char_whitelist=0123456789AB --oem 1 --psm 7")
text = pytesseract.image_to_string(image_2, config=config_7)
print("OCR TEXT: " + "{}\n".format(text))

The problem is that the text located in the arrow is never centered. Sometimes I remove part of the text with the method described above (e.g. in image 50A).

Is there a method in image processing to get rid of the arrow in a more elegant way? For instance using contour detection and deletion? I am more interested in the OpenCV part than the tesseract part to recognize the text.

Any help is appreciated.


Solution

  • If you look at the pictures you will see that there is a white arrow in the image which is also the biggest contour (especially if you draw a black border on the image). If you make a blank mask and draw the arrow (biggest contour on the image) then erode it a little bit you can perform a per element bitwise conjunction of the actual image and eroded mask. If it is not clear look at the bottom code and comments and you will see that it is actually pretty simple.

    # imports
    import cv2
    import numpy as np
    
    img = cv2.imread("number.png")  # read image
    # you can resize the image here if you like - it should still work for both sizes
    h, w = img.shape[:2]  # get the actual images height and width
    img = cv2.resize(img, (int(w*320/h), 320))
    h, w = img.shape[:2]
    
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # transform to grayscale
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]  # perform OTSU threhold
    cv2.rectangle(thresh, (0, 0), (w, h), (0, 0, 0), 2)
    contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]  # search for contours
    max_cnt = max(contours, key=cv2.contourArea)  # select biggest one
    mask = np.zeros((h, w), dtype=np.uint8)  # create a black mask
    cv2.drawContours(mask, [max_cnt], -1, (255, 255, 255), -1)  # draw biggest contour on the mask
    kernel = np.ones((15, 15), dtype=np.uint8)  # make a kernel with appropriate values - in both cases (resized and original) 15 is ok
    erosion = cv2.erode(mask, kernel, iterations=1)  # erode the mask with given kernel
    
    reverse = cv2.bitwise_not(img.copy())  # reversed image of the actual image 0 becomes 255 and 255 becomes 0
    img = cv2.bitwise_and(reverse, reverse, mask=erosion)  # per-element bit-wise conjunction of the actual image and eroded mask (erosion)
    img = cv2.bitwise_not(img)  # revers the image again
    
    # save image to file and display
    cv2.imwrite("res.png", img)
    cv2.imshow("img", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Result:

    enter image description here