Search code examples
pythonopencvcomputer-visionpython-tesseract

How to extract the circular text from that image?


I have an image which contains the text in circular form. In this image, there are two cicles. I want to remove the inner circle text from the image, and extract the outer circle text. How to remove the inner circle text, and after removing the inner text, how to extract the outer circle text? What are the steps to solve this problem?

Input image:

Input image


Solution

  • Your image was a nice toy example to play around with cv2.warpPolar, so I made some code, that I will share here, too. So, that'd be my approach:

    1. Grayscale and binarize the input image, mainly to get rid of JPG artifacts.

    2. Crop the center part of the image to get rid of the large areas left and right, since we'll find contours later, so that becomes less difficult.

      Crop

    3. Find (nested) contours, cf. cv2.RETR_TREE. Please, see this answer for an extensive explanation on contour hierarchies.

    4. Filter and sort the found contours by area, such that only the four circle related contours (inner and outer edges for two circles) are kept.

    5. Remove the inner text by simply painting over using the contours from the inner circle.

      Removed text

      If explicitly needed, do that for the original image also.

      Removed text in original

    6. Rotate the image before remapping, cf. the explanations in the linked cv2.warpPolar documentation. Remap image to polar coordinates, and rotate the result for proper OCR.

      OCR

    7. Run pytesseract whitelisting only capital letters.

    That's the full code with the proper output:

    import cv2
    import pytesseract
    
    # Read image
    img = cv2.imread('fcJAc.jpg')
    
    # Convert to grayscale, and binarize, especially for removing JPG artifacts
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    gray = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY_INV)[1]
    
    # Crop center part of image to simplify following contour detection
    h, w = gray.shape
    l = (w - h) // 2
    gray = gray[:, l:l+h]
    
    # Find (nested) contours (cf. cv2.RETR_TREE) w.r.t. the OpenCV version
    cnts = cv2.findContours(gray, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    
    # Filter and sort contours on area
    cnts = [cnt for cnt in cnts if cv2.contourArea(cnt) > 10000]
    cnts = sorted(cnts, key=cv2.contourArea)
    
    # Remove inner text by painting over using found contours
    # Contour index 1 = outer edge of inner circle
    gray = cv2.drawContours(gray, cnts, 1, 0, cv2.FILLED)
    
    # If specifically needed, also remove text in the original image
    # Contour index 0 = inner edge of inner circle (to keep inner circle itself)
    img[:, l:l+h] = cv2.drawContours(img[:, l:l+h], cnts, 0, (255, 255, 255),
                                     cv2.FILLED)
    
    # Rotate image before remapping to polar coordinate space to maintain
    # circular text en bloc after remapping
    gray = cv2.rotate(gray, cv2.ROTATE_90_COUNTERCLOCKWISE)
    
    # Actual remapping to polar coordinate space
    gray = cv2.warpPolar(gray, (-1, -1), (h // 2, h // 2), h // 2,
                         cv2.INTER_CUBIC + cv2.WARP_POLAR_LINEAR)
    
    # Rotate result for OCR
    gray = cv2.rotate(gray, cv2.ROTATE_90_COUNTERCLOCKWISE)
    
    # Actual OCR, limiting to capital letters only
    config = '--psm 6 -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ "'
    text = pytesseract.image_to_string(gray, config=config)
    print(text.replace('\n', '').replace('\f', ''))
    # CIRCULAR TEXT PHOTOSHOP TUTORIAL
    
    ----------------------------------------
    System information
    ----------------------------------------
    Platform:      Windows-10-10.0.19041-SP0
    Python:        3.9.1
    PyCharm:       2021.1.1
    OpenCV:        4.5.2
    pytesseract:   5.0.0-alpha.20201127
    ----------------------------------------