Search code examples
pythonopencvocrtesseractpython-tesseract

What could I do to improve my OCR result using pytesseract?


I am trying to apply OCR using OpenCV and Python-tesseract to convert the following image to text: Original image.

But tesseract has not managed to correctly read the image as of yet. It reads:uleswylly Bie7 Srp a7 instead.

I have taken the following steps to pre-process the image before I feed it to tesseract:

  1. First I upscale the image:
# Image scaling
def set_image_dpi(img):
    # Get current dimensions of the image
    height, width = img.shape[:2]

    # Define scale factor
    scale_factor = 6

    # Calculate new dimensions
    new_height = int(height * scale_factor)
    new_width = int(width * scale_factor)

    # Resize image
    return cv2.resize(img, (new_width, new_height))

Image result: result1.png

  1. Normalize the image:
# Normalization
norm_img = np.zeros((img.shape[0], img.shape[1]))
img = cv2.normalize(img, norm_img, 0, 255, cv2.NORM_MINMAX)

Image result: result2.png

  1. Then I remove some noise:
# Remove noise
def remove_noise(img):
    return cv2.fastNlMeansDenoisingColored(img, None, 10, 10, 7, 15)

Image result: result3.png

  1. Get the grayscale image:
# Get grayscale
def get_grayscale(img):
    return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Image result: result4.png

  1. Apply thresholding:
# Thresholding
def thresholding(img):
    return cv2.threshold(img, 150, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) [1]

Image result: result5.png

  1. Invert the image color:
# Invert the image
def invert(img):
    return cv2.bitwise_not(img)

Image result: result6.png

  1. Finally I pass the image to pytesseract:
# Pass preprocessed image to pytesseract
text = pytesseract.image_to_string(img)
print("Text found: " + text)

pytesseract output: "uleswylly Bie7 Srp a7"

I would like to improve my pre-processing so that pytesseract can actually read the image? Any help would be greatly appreciated!

Thanks in advance,

Steenert


Solution

  • The problem is a bit challenging, without overfitting the solution to the problem...

    Let assume that the text is bright, colorless and surrounded by colored pixels. We may also assume that the background is relatively homogenous.

    We may start with result3.png and use the following stages:

    • Add padding with the color of the top left pixel.
      The padding is used as preparation for floodFill (required because some colored pixel touches the image margins).
    • Fill the background with light blue color.
      Note that the selected color is a bit of an overfitting, because the saturation level needs to be close to the level of the red pixels.
    • Convert from BGR to HSV color space, and extract the saturation channel.
    • Apply thresholding (use cv2.THRESH_OTSU for automatic thresholding).
    • Apply pytesseract.image_to_string to the thresholded image.

    Code sample:

    import cv2
    import numpy as np
    import pytesseract
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # May be required when using Windows
    
    img = cv2.imread('result3.png')  # Read result3.png
    
    # Add padding with the color of the top left pixel
    pad_color = img[0, 0, :]
    padded_img = np.full((img.shape[0]+10, img.shape[1]+10, 3), pad_color, np.uint8)
    padded_img[5:-5, 5:-5, :] = img
    
    cv2.floodFill(padded_img, None, (0, 0), (255, 100, 100), loDiff=(10, 10, 10), upDiff=(10, 10, 10))  # Fill the background with blue color.
    cv2.imwrite('result7.png', padded_img)
    
    # Convert from BGR to HSV color space, and extract the saturation channel.
    hsv = cv2.cvtColor(padded_img, cv2.COLOR_BGR2HSV)
    s = hsv[:, :, 1]
    cv2.imwrite('result8.png', s)
    
    # Apply thresholding (use `cv2.THRESH_OTSU` for automatic thresholding)
    thresh = cv2.threshold(s, 0, 255, cv2.THRESH_OTSU)[1]
    cv2.imwrite('result9.png', thresh)
    
    # Pass preprocessed image to PyTesseract
    text = pytesseract.image_to_string(thresh, config="--psm 6")
    print("Text found: " + text)
    

    Output:
    Text found: Jules -Lv: 175 -P.17


    result7.png (after floodFill):
    enter image description here

    result8.png (after extracting the saturation channel):
    enter image description here

    result9.png (after thresholding):
    enter image description here