Search code examples
python-3.xopencvimage-processingtesseract

How to detect the boundaries of records in an image?


I have a huge number of JPEG images which are in high resolution (2500 x 3500 pixels) and are roughly in this shape:

enter image description here

Each of the numbers designate a separate record and my aim is to convert these to text.

I am aware of various OCR solutions such OpenCV or Tesseract, but my problem is in detecting the boundary of each record (so that later on, feed each one to the OCR). How can I achieve something like this:

enter image description here


Solution

  • Since every record starts with a blue number, you can threshold on blue-ish colors using the HSV color space to mask these texts. On that mask, use morphological closing, to get "boxes" from these blue texts. From that modified mask, find the contours, and determine their upper y coordinate. Extract the single records from the original image by slicing from one y coordinate to the next (+/- a few pixels) and using the full image width.

    Here's some code for that:

    import cv2
    import numpy as np
    
    # Read image
    img = cv2.imread('CfOBO.png')
    
    # Thresholding blue-ish colors using HSV color space
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    blue_lower = (90, 128, 64)
    blue_upper = (135, 255, 192)
    blue_mask = cv2.inRange(hsv, blue_lower, blue_upper)
    
    # Morphological closing
    blue_mask = cv2.morphologyEx(blue_mask, cv2.MORPH_CLOSE, np.ones((11, 11)))
    
    # Find contours w.r.t. the OpenCV version
    cnts = cv2.findContours(blue_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    
    # Get y coordinate from bounding rectangle for each contour
    y = sorted([cv2.boundingRect(cnt)[1] for cnt in cnts])
    
    # Manually add end of last record
    y.append(img.shape[0])
    
    # Extract records
    records = [img[y[i]-5:y[i+1]-5, ...] for i in range(len(cnts))]
    
    # Show records
    for record in records:
        cv2.imshow('Record', record)
        cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    There's plenty of room for optimization, e.g. if the last record has some large white space following. I just added the image bottom for the lower end of the last record. But, the general workflow should do what's desired. (I left out the following pytesseract stuff.)

    ----------------------------------------
    System information
    ----------------------------------------
    Platform:      Windows-10-10.0.16299-SP0
    Python:        3.9.1
    NumPy:         1.20.1
    OpenCV:        4.5.1
    ----------------------------------------