Search code examples
pythonopencvpython-tesseract

Python - cv2 find Contours


I would like to find all the big elements in the document, but I do not know how to control the size (the document is downloaded from the Internet :))

I have a document

enter image description here

And I wrote a simple code

import cv2
import pytesseract

image = cv2.imread('2.png')

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7, 7), 0)
thresh = cv2.threshold(
    blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

kernal = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 50))
dilate = cv2.dilate(thresh, kernal, iterations=1)

cv2.imwrite('1_dilated.png', dilate)

cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

cnts = cnts[0] if len(cnts) == 2 else cnts[1]

cnts = sorted(cnts, key=lambda x: cv2.boundingRect(x)[1])

for c in cnts:
    x, y, w, h = cv2.boundingRect(c)
    if h > 100 and w > 100:
        roi = image[y:y+h, x:x+w]
        cv2.rectangle(image, (x, y), (x+w, y+h), (36, 255, 12), 2)
        # ocr = pytesseract.image_to_string(roi)
        # print(ocr)
cv2.imwrite('1_boxes4.png', image)

But only detects it

enter image description here

And I would like this

enter image description here

How to control the size of the detected area ?

Thank you very much for all your comments


Solution

  • You are close, but you need to increase the number of iterations of the dilate operation. Also, a rectangular structuring element might help better forming the blobs of text. Let's check out some possible improvements of your code:

    # imports:
    import cv2
    import numpy as np
    
    # Set image path
    imagePath = "D://opencvImages//"
    imageName = "F74Yq.png"    
    
    # Read image:
    inputImage = cv2.imread(imagePath + imageName)
    
    # Store a deeep copy for results:
    inputCopy = inputImage.copy()
    
    # Convert BGR to grayscale:
    grayInput = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
    
    # Threshold via Otsu
    _, binaryImage = cv2.threshold(grayInput, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
    

    The first part produces the binary image of the input image, there's nothing fancy going on here - just a direct thresholding via Otsu's method. This is the binary image obtained:

    Now, let's apply the dilate operation. Let's use a 9 x 9 rectangular kernel and set the number of iterations to 5. Gotta be careful you don't dilate too much, because blobs of text from different portions of the document could end up joined:

    # Set kernel (structuring element) size:
    kernelSize = (9, 9)
    
    # Set operation iterations:
    opIterations = 5
    
    # Get the structuring element:
    morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
    
    # Perform Dilate:
    dilateImage = cv2.morphologyEx(binaryImage, cv2.MORPH_DILATE, morphKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
    

    This is the result:

    Ok, now let's just detect external contours and get their bounding boxes so we can draw rectangles around the target areas. Note that I'm drawing the rectangles on a deep copy of the input:

    # Find the contours on the binary image:
    contours, hierarchy = cv2.findContours(dilateImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    
    # Look for the outer bounding boxes (no children):
    for _, c in enumerate(contours):
    
        # Get the contours bounding rectangle:
        boundRect = cv2.boundingRect(c)
    
        # Get the dimensions of the bounding rectangle:
        rectX = boundRect[0]
        rectY = boundRect[1]
        rectWidth = boundRect[2]
        rectHeight = boundRect[3]
    
        # Set bounding rectangle:
        color = (0, 0, 255)
        cv2.rectangle( inputCopy, (int(rectX), int(rectY)),
                       (int(rectX + rectWidth), int(rectY + rectHeight)), color, 5 )
    
        cv2.imshow("Bounding Rectangles", inputCopy)
        cv2.waitKey()
    

    This is the final result: