Search code examples
pythonimageopencvimage-preprocessing

Remove noise from image without losing data in OpenCV


i used this code:

    horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontalsize, 1))
    horizontal = cv2.erode(horizontal, horizontalStructure, (-1, -1))
    horizontal = cv2.dilate(horizontal, horizontalStructure, (-1, -1))

to remove lines.

and some filters to delete the noises and bold the font:

 blur = cv2.GaussianBlur(img, (11, 11), 0)
 thresh = cv2.threshold(blur, 80, 255, cv2.THRESH_BINARY)[1]
 kernel = np.ones((2,1), np.uint8)
 dilation = cv2.erode(thresh, kernel, iterations=1)
 dilation = cv2.bitwise_not(dilation)

Despite the threshold and other methods, as you can see lots of noise remained

This is the result I want to reach:

Do you know an OpenCV filter that will help me achieve this result?


Solution

  • The following solution is not a perfect, and not generic solution, but I hope it's good enough for your needs.

    For removing the line I suggest using cv2.connectedComponentsWithStats for finding clusters, and mask the wide or long clusters.

    The solution uses the following stages:

    • Convert image to Grayscale.
    • Apply threshold and invert polarity.
      Use automatic thresholding by applying flag cv2.THRESH_OTSU.
    • Use "close" morphological operation to close small gaps.
    • Find connected components (clusters) with statistics.
    • Iterate the clusters, and delete clusters with large width and large height.
      Remove very small clusters - considered to be noise.
    • The top and left side is cleaned "manually".

    Here is the code:

    import numpy as np
    import cv2
    
    img = cv2.imread('Heshbonit.jpg')  # Read input image
    
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # Convert to Grayscale.
    
    ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)  # Convert to binary and invert polarity
    
    # Use "close" morphological operation to close small gaps
    thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.array([1, 1]));
    thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.array([1, 1]).T);
    
    nlabel,labels,stats,centroids = cv2.connectedComponentsWithStats(thresh, connectivity=8)
    
    thresh_size = 100
    
    # Delete all lines by filling wide and long lines with zeros.
    # Delete very small clusters (assumes to be noise).
    for i in range(1, nlabel):
        #
        if (stats[i, cv2.CC_STAT_WIDTH] > thresh_size) or (stats[i, cv2.CC_STAT_HEIGHT] > thresh_size):
            thresh[labels == i] = 0
        if stats[i, cv2.CC_STAT_AREA] < 4:
            thresh[labels == i] = 0
    
    # Clean left and top margins "manually":
    thresh[:, 0:30] = 0
    thresh[0:10, :] = 0
    
    # Inverse polarity
    thresh = 255 - thresh
    
    # Write result to file
    cv2.imwrite('thresh.png', thresh)