I have some files that contain some scanned receipts and I need to extract the text from them using OCR. Since the printed words in a receipt would fade out after some time, some words in the receipts are not clear and affect the OCR result.
Some examples of faded words:
Are there any ways to restore the faded parts so that I can improve the OCR result?
I have tried image thresholding and image smoothing in OpenCV but the results are not very satisfactory. Can the image be further processed?
This method is not perfect and is not suitable for all characters (it is better to specify the range of characters, separate them and then try this method on separate characters). This is a basic idea; Maybe you can complete it. The final characters do not look like the original font and may just be more readable. This seems natural given the method chosen; Because of the damage to the characters, recognizing the name and type of the initial font is not easy.
import sys
import cv2
import numpy as np
# Load and resize image
im = cv2.imread(sys.path[0]+'/im.png')
H, W = im.shape[:2]
S = 4
im = cv2.resize(im, (W*S, H*S))
# Convert to binary
msk = im.copy()
msk = cv2.cvtColor(msk, cv2.COLOR_BGR2GRAY)
msk = cv2.threshold(msk, 200, 255, cv2.THRESH_BINARY)[1]
# Glue char blobs together
kernel1 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (11, 13))
kernel2 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (4, 5))
msk = cv2.medianBlur(msk, 3)
msk = cv2.erode(msk, kernel1)
msk = cv2.erode(msk, kernel2)
# Skeletonization-like operation in OpenCV
thinned = cv2.ximgproc.thinning(~msk)
# Make final chars
msk = cv2.cvtColor(msk, cv2.COLOR_GRAY2BGR)
thinned = cv2.cvtColor(thinned, cv2.COLOR_GRAY2BGR)
thicked = cv2.erode(~thinned, np.ones((9, 15)))
thicked = cv2.medianBlur(thicked, 11)
# Save the output
top = np.hstack((im, ~msk))
btm = np.hstack((thinned, thicked))
cv2.imwrite(sys.path[0]+'/im_out.png', np.vstack((top, btm)))
More information about modules and their licenses: OpenCV, NumPy
Note that the thinning algorithm is located in the opencv_contrib repository; Therefore, consider its license for use.