Search code examples
pythonopencvimage-processingocrreceipt

Can the faded parts of a character in a receipt be restored?


I have some files that contain some scanned receipts and I need to extract the text from them using OCR. Since the printed words in a receipt would fade out after some time, some words in the receipts are not clear and affect the OCR result.

Some examples of faded words:

Faded 1

Faded 2

Are there any ways to restore the faded parts so that I can improve the OCR result?

I have tried image thresholding and image smoothing in OpenCV but the results are not very satisfactory. Can the image be further processed?

Averaging then Gaussian Threshold enter image description here

Gaussian Blur then Gaussian Threshold enter image description here


Solution

  • This method is not perfect and is not suitable for all characters (it is better to specify the range of characters, separate them and then try this method on separate characters). This is a basic idea; Maybe you can complete it. The final characters do not look like the original font and may just be more readable. This seems natural given the method chosen; Because of the damage to the characters, recognizing the name and type of the initial font is not easy.

    import sys
    import cv2
    import numpy as np
    
    # Load and resize image
    im = cv2.imread(sys.path[0]+'/im.png')
    H, W = im.shape[:2]
    S = 4
    im = cv2.resize(im, (W*S, H*S))
    
    # Convert to binary
    msk = im.copy()
    msk = cv2.cvtColor(msk, cv2.COLOR_BGR2GRAY)
    msk = cv2.threshold(msk, 200, 255, cv2.THRESH_BINARY)[1]
    
    # Glue char blobs together
    kernel1 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (11, 13))
    kernel2 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (4, 5))
    msk = cv2.medianBlur(msk, 3)
    msk = cv2.erode(msk, kernel1)
    msk = cv2.erode(msk, kernel2)
    
    # Skeletonization-like operation in OpenCV
    thinned = cv2.ximgproc.thinning(~msk)
    
    # Make final chars
    msk = cv2.cvtColor(msk, cv2.COLOR_GRAY2BGR)
    thinned = cv2.cvtColor(thinned, cv2.COLOR_GRAY2BGR)
    thicked = cv2.erode(~thinned, np.ones((9, 15)))
    thicked = cv2.medianBlur(thicked, 11)
    
    # Save the output
    top = np.hstack((im, ~msk))
    btm = np.hstack((thinned, thicked))
    cv2.imwrite(sys.path[0]+'/im_out.png', np.vstack((top, btm)))
    

    enter image description here


    More information about modules and their licenses: OpenCV, NumPy

    Note that the thinning algorithm is located in the opencv_contrib repository; Therefore, consider its license for use.