Search code examples
pythonimageopencvimage-processingline

Removing slant horizontal lines in image


original image

I have an image where I have a horizontal line underlying the text ; after applying through various techniques in order a. HoughLineP and HoughLine and this code

 image = cv2.imread('D:\\detect_words.jpg')
 gray = 255 - cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
 for row in range(gray.shape[0]):
    avg = np.average(gray[row, :] > 16)
    if avg > 0.25:
        cv2.line(image, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
        cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)
  cv2.imwrite('D:\\words\\final_removed.jpg',image)

I am able to get to this after processing

after this phase; I am applying erosion and dilation

kernel = np.ones((3,3), np.uint8) 
img_erosion = cv2.erode(255-gray, kernel, iterations=1) 
img_dilation = cv2.dilate(img_erosion, kernel, iterations=1) 
cv2.imwrite('D:\\words\\final_removed4.jpg',255-img_dilation)

final image after dilation and erosion

My question is; removing the horizontal lines although removes but there is pixel loss for words; and not all the horizontal lines are removed. Is there a novel approch where this loss can be minimized and all horizontal lines are removed (here the horizontal lines above AGE is still present).


Solution

  • Here's an approach:

    • Convert image to grayscale
    • Otsu's threshold to get binary image
    • Create horizontal kernel and morph open to detect lines
    • Find contours and draw in detected lines

    After converting to grayscale, we Otsu's threshold to obtain a binary image

    enter image description here

    image = cv2.imread('1.jpg')
    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    

    Now we create a special horizontal kernel to detect horizontal lines then morph open to obtain a mask of the detected lines

    enter image description here

    horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
    detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
    

    Here's the detected lines drawn on the original image

    enter image description here

    From here we find contours on this mask and draw them in to effectively remove the horizontal lines to get our result

    enter image description here

    cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    
    for c in cnts:
        cv2.drawContours(image, [c], -1, (255,255,255), 3)
    

    Now that the horizontal lines are removed, to repair the text, you can try cv2.MORPH_CLOSE with a cv2.MORPH_CROSS kernel and experiment with various kernel sizes. There is a tradeoff between dilating too much to close the holes as the detail in the text will be lost. Another approach is to use image inpainting to fill in the holes. I'll leave this step to you

    Full code

    import cv2
    
    image = cv2.imread('1.jpg')
    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    
    horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
    detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
    
    cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    
    for c in cnts:
        cv2.drawContours(image, [c], -1, (255,255,255), 3)
    
    cv2.imshow('thresh', thresh)
    cv2.imshow('detected_lines', detected_lines)
    cv2.imshow('image', image)
    cv2.waitKey()