I'm trying to do OCR on this kind of images:
Unfortunately, tesseract is unable to retrieve the number because of the noisy points arround the characters.
I tried playing with ImageMagick to enhance the quality of the image but no luck.
Examples:
convert input.tif -level 0%,150% output.tif
convert input.tif -colorspace CMYK -separate output_%d.tif
Is there any way to retrieve efficiently the characters in this kind of images?
Many thanks.
Simple closing operation(Dilation followed by Erosion) will give you desired output. Below is the Python implementation of the same.
img = cv2.imread(r'D:\Image\noiseOCR.png',0)
kernel = np.ones((3,3),np.uint8)
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)