I am trying to apply OCR using OpenCV and Python-tesseract to convert the following image to text: Original image.
But tesseract has not managed to correctly read the image as of yet. It reads:uleswylly Bie7 Srp a7 instead.
I have taken the following steps to pre-process the image before I feed it to tesseract:
# Image scaling
def set_image_dpi(img):
# Get current dimensions of the image
height, width = img.shape[:2]
# Define scale factor
scale_factor = 6
# Calculate new dimensions
new_height = int(height * scale_factor)
new_width = int(width * scale_factor)
# Resize image
return cv2.resize(img, (new_width, new_height))
Image result: result1.png
# Normalization
norm_img = np.zeros((img.shape[0], img.shape[1]))
img = cv2.normalize(img, norm_img, 0, 255, cv2.NORM_MINMAX)
Image result: result2.png
# Remove noise
def remove_noise(img):
return cv2.fastNlMeansDenoisingColored(img, None, 10, 10, 7, 15)
Image result: result3.png
# Get grayscale
def get_grayscale(img):
return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Image result: result4.png
# Thresholding
def thresholding(img):
return cv2.threshold(img, 150, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) [1]
Image result: result5.png
# Invert the image
def invert(img):
return cv2.bitwise_not(img)
Image result: result6.png
# Pass preprocessed image to pytesseract
text = pytesseract.image_to_string(img)
print("Text found: " + text)
pytesseract output: "uleswylly Bie7 Srp a7"
I would like to improve my pre-processing so that pytesseract can actually read the image? Any help would be greatly appreciated!
Thanks in advance,
Steenert
The problem is a bit challenging, without overfitting the solution to the problem...
Let assume that the text is bright, colorless and surrounded by colored pixels. We may also assume that the background is relatively homogenous.
We may start with result3.png
and use the following stages:
floodFill
(required because some colored pixel touches the image margins).cv2.THRESH_OTSU
for automatic thresholding).pytesseract.image_to_string
to the thresholded image.Code sample:
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # May be required when using Windows
img = cv2.imread('result3.png') # Read result3.png
# Add padding with the color of the top left pixel
pad_color = img[0, 0, :]
padded_img = np.full((img.shape[0]+10, img.shape[1]+10, 3), pad_color, np.uint8)
padded_img[5:-5, 5:-5, :] = img
cv2.floodFill(padded_img, None, (0, 0), (255, 100, 100), loDiff=(10, 10, 10), upDiff=(10, 10, 10)) # Fill the background with blue color.
cv2.imwrite('result7.png', padded_img)
# Convert from BGR to HSV color space, and extract the saturation channel.
hsv = cv2.cvtColor(padded_img, cv2.COLOR_BGR2HSV)
s = hsv[:, :, 1]
cv2.imwrite('result8.png', s)
# Apply thresholding (use `cv2.THRESH_OTSU` for automatic thresholding)
thresh = cv2.threshold(s, 0, 255, cv2.THRESH_OTSU)[1]
cv2.imwrite('result9.png', thresh)
# Pass preprocessed image to PyTesseract
text = pytesseract.image_to_string(thresh, config="--psm 6")
print("Text found: " + text)
Output:
Text found: Jules -Lv: 175 -P.17
result7.png
(after floodFill):