python image opencv image-processing line

Removing slant horizontal lines in image

I have an image where I have a horizontal line underlying the text ; after applying through various techniques in order a. HoughLineP and HoughLine and this code

 image = cv2.imread('D:\\detect_words.jpg')
 gray = 255 - cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
 for row in range(gray.shape[0]):
    avg = np.average(gray[row, :] > 16)
    if avg > 0.25:
        cv2.line(image, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
        cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)
  cv2.imwrite('D:\\words\\final_removed.jpg',image)

I am able to get to this

after this phase; I am applying erosion and dilation

kernel = np.ones((3,3), np.uint8) 
img_erosion = cv2.erode(255-gray, kernel, iterations=1) 
img_dilation = cv2.dilate(img_erosion, kernel, iterations=1) 
cv2.imwrite('D:\\words\\final_removed4.jpg',255-img_dilation)

My question is; removing the horizontal lines although removes but there is pixel loss for words; and not all the horizontal lines are removed. Is there a novel approch where this loss can be minimized and all horizontal lines are removed (here the horizontal lines above AGE is still present).

Solution

Here's an approach:

Convert image to grayscale
Otsu's threshold to get binary image
Create horizontal kernel and morph open to detect lines
Find contours and draw in detected lines

After converting to grayscale, we Otsu's threshold to obtain a binary image

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Now we create a special horizontal kernel to detect horizontal lines then morph open to obtain a mask of the detected lines

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

Here's the detected lines drawn on the original image

From here we find contours on this mask and draw them in to effectively remove the horizontal lines to get our result

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (255,255,255), 3)

Now that the horizontal lines are removed, to repair the text, you can try cv2.MORPH_CLOSE with a cv2.MORPH_CROSS kernel and experiment with various kernel sizes. There is a tradeoff between dilating too much to close the holes as the detail in the text will be lost. Another approach is to use image inpainting to fill in the holes. I'll leave this step to you

Full code

import cv2

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (45,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (255,255,255), 3)

cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()