Search code examples
pythonopencvocrtesseractpython-tesseract

Python OCR on a noisy and rotated image


I've been working on a python project that can recognize codes in pictures but I've run into trouble when the code is sideways and in such a nosy background nothing I've tried has been close to working so I'm shooting in the dark. Some insight into how to solve this problem would be appreciated. I've tried anything from tesseract to Keras-ocr but I only get back random numbers and letters.

import pytesseract
import PIL.Image

import cv2


my_config = r"--psm 11 --oem 3"

text = pytesseract.image_to_string(PIL.Image.open(path_to_image2), config = my_config)
print(text)

Ive also tried lots of stuff with cv2 like creating masks but the masks arent dynamic enough to filter out the background

Here are two examples that I've been dealing with Thank you to anyone who tries to help me out!

enter image description here enter image description here


Solution

  • I hope this will help for your case. I had tried the same exact scenario with easyocr.

    import easyocr
    import cv2
    import numpy as np
    from scipy import ndimage
    
    
    def image_sharpen(image, kernel_size=(5, 5), sigma=1.0, amount=1.0, threshold=0):
        blurred = cv2.GaussianBlur(image, kernel_size, sigma)
        sharpened = float(amount + 1) * image - float(amount) * blurred
        sharpened = np.maximum(sharpened, np.zeros(sharpened.shape))
        sharpened = np.minimum(sharpened, 255 * np.ones(sharpened.shape))
        sharpened = sharpened.round().astype(np.uint8)
        if threshold > 0:
            low_contrast_mask = np.absolute(image - blurred) < threshold
            np.copyto(sharpened, image, where=low_contrast_mask)
        return sharpened
    
    def reduce_brightness(image, gamma=1.0):
        invGamma = 1.0 / gamma
        table = np.array([((i / 255.0) ** invGamma) * 255
                          for i in np.arange(0, 256)]).astype("uint8")
        return cv2.LUT(image, table)
    
    gamma = 0.35
    rotation_angle = 155 # 25
    image = cv2.imread('pze5c.png') # yCVet.jpg
    rotate_image = ndimage.rotate(image,rotation_angle) 
    sharpened = image_sharpen(rotate_image)
    adjusted = reduce_brightness(sharpened, gamma=gamma)
    
    cv2.imwrite('resize.png', adjusted)
    # cv2.imshow('',adjusted)
    # cv2.waitKey(0)
    reader = easyocr.Reader(['en'], gpu=False)
    result = reader.readtext('resize.png')
    for detection in result:
        print(detection)
    

    The output what i got is

    ([[37, 394], [994, 394], [994, 505], [37, 505]], '9 Q 9 E - C R H L G F  D ] B', 0.13722358295856807)
    

    Instead of J it recognized ]

    for the other image i change the rotation angle as 25. The output is,

    ([[78, 565], [1515, 565], [1515, 678], [78, 678]], "Z G P H ' $ B X Z B 2 - 5 M B *", 0.3908300967267578)
    

    Instead of S it picked $. Feel free to play around with gamma, rotational angle and other. For higher performance, you go with google vision but it is non-opensource.