Search code examples
pythonopencvimage-processingtesseractcaptcha

How to remove Empty Spaces and Dots in Text from Image with OpenCV


I'm processing the images with OpenCV and Python. I need to remove the dots / noise from the image. The Background and the Text,both have the empty Dots/ lines. here is a example: Example Image with this code i am able to remove the background

import cv2
import numpy as np

img = cv2.imread('image.png', 0)
_, blackAndWhite = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)

nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, None,None, None, 8, cv2.CV_32S)
sizes = stats[1:, -1] #get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)

for i in range(0, nlabels - 1):
    if sizes[i] >= 50:   #filter small dotted regions
        img2[labels == i + 1] = 255

res = cv2.bitwise_not(img2)

cv2.imwrite('res.png', res)

result is: Result

how can i fill the empty spaces in the text? or How can i make this image readable for OCR tesseract?


Solution

  • I'm not sure if this will work but you can try a dilate+erode and/or a erode+dilate transform.

    See this image: image transformations: Open and Close

    This can be implemented with OpenCV: https://docs.opencv.org/4.5.5/d9/d61/tutorial_py_morphological_ops.html

    Best Regards