Is there any way to get only detections of rectangles in contour detection in OpenCV and ignore other detection due to text?

I want to detect all text-boxes of a webpage visually using OpencV contour detection. But here, it's also detecting text, I need to filter out the other results and get the detection of only the rectangular boxes.

I want only the rectangular detection of the boxes and buttons and filter out all other detections of text.

Solution

Use below code as a starting point:

img =  cv2.imread('amazon.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# inverse thresholding
thresh = cv2.threshold(gray, 195, 255, cv2.THRESH_BINARY_INV)[1]

# find contours
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]

mask = np.ones(img.shape[:2], dtype="uint8") * 255
for c in contours:
    # get the bounding rect
    x, y, w, h = cv2.boundingRect(c)
    if w*h>1000:
        cv2.rectangle(mask, (x, y), (x+w, y+h), (0, 0, 255), -1)

res_final = cv2.bitwise_and(img, img, mask=cv2.bitwise_not(mask))

cv2.imshow("boxes", mask)
cv2.imshow("final image", res_final)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output:

Figure 1: Original image:

Figure 2: mask of desired contours:

Figure 3: Detected contours(desired) in the original image: