Search code examples
image-processingimagemagickocrtesseract

Lower noise in picture to enable OCR with tesseract


I'm trying to do OCR on this kind of images:

enter image description here

Unfortunately, tesseract is unable to retrieve the number because of the noisy points arround the characters.

I tried playing with ImageMagick to enhance the quality of the image but no luck.

Examples:

 convert input.tif -level 0%,150% output.tif

 convert input.tif -colorspace CMYK -separate output_%d.tif

enter image description here

Is there any way to retrieve efficiently the characters in this kind of images?

Many thanks.


Solution

  • Simple closing operation(Dilation followed by Erosion) will give you desired output. Below is the Python implementation of the same.

    img = cv2.imread(r'D:\Image\noiseOCR.png',0)
    kernel = np.ones((3,3),np.uint8)
    closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
    

    Denoised Output image