python opencv computer-vision python-tesseract

How to extract the circular text from that image?

I have an image which contains the text in circular form. In this image, there are two cicles. I want to remove the inner circle text from the image, and extract the outer circle text. How to remove the inner circle text, and after removing the inner text, how to extract the outer circle text? What are the steps to solve this problem?

Input image:

Solution

Your image was a nice toy example to play around with cv2.warpPolar, so I made some code, that I will share here, too. So, that'd be my approach:

Grayscale and binarize the input image, mainly to get rid of JPG artifacts.
Crop the center part of the image to get rid of the large areas left and right, since we'll find contours later, so that becomes less difficult.
Find (nested) contours, cf. cv2.RETR_TREE. Please, see this answer for an extensive explanation on contour hierarchies.
Filter and sort the found contours by area, such that only the four circle related contours (inner and outer edges for two circles) are kept.
Remove the inner text by simply painting over using the contours from the inner circle.

If explicitly needed, do that for the original image also.
Rotate the image before remapping, cf. the explanations in the linked cv2.warpPolar documentation. Remap image to polar coordinates, and rotate the result for proper OCR.
Run pytesseract whitelisting only capital letters.

That's the full code with the proper output:

import cv2
import pytesseract

# Read image
img = cv2.imread('fcJAc.jpg')

# Convert to grayscale, and binarize, especially for removing JPG artifacts
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY_INV)[1]

# Crop center part of image to simplify following contour detection
h, w = gray.shape
l = (w - h) // 2
gray = gray[:, l:l+h]

# Find (nested) contours (cf. cv2.RETR_TREE) w.r.t. the OpenCV version
cnts = cv2.findContours(gray, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

# Filter and sort contours on area
cnts = [cnt for cnt in cnts if cv2.contourArea(cnt) > 10000]
cnts = sorted(cnts, key=cv2.contourArea)

# Remove inner text by painting over using found contours
# Contour index 1 = outer edge of inner circle
gray = cv2.drawContours(gray, cnts, 1, 0, cv2.FILLED)

# If specifically needed, also remove text in the original image
# Contour index 0 = inner edge of inner circle (to keep inner circle itself)
img[:, l:l+h] = cv2.drawContours(img[:, l:l+h], cnts, 0, (255, 255, 255),
                                 cv2.FILLED)

# Rotate image before remapping to polar coordinate space to maintain
# circular text en bloc after remapping
gray = cv2.rotate(gray, cv2.ROTATE_90_COUNTERCLOCKWISE)

# Actual remapping to polar coordinate space
gray = cv2.warpPolar(gray, (-1, -1), (h // 2, h // 2), h // 2,
                     cv2.INTER_CUBIC + cv2.WARP_POLAR_LINEAR)

# Rotate result for OCR
gray = cv2.rotate(gray, cv2.ROTATE_90_COUNTERCLOCKWISE)

# Actual OCR, limiting to capital letters only
config = '--psm 6 -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ "'
text = pytesseract.image_to_string(gray, config=config)
print(text.replace('\n', '').replace('\f', ''))
# CIRCULAR TEXT PHOTOSHOP TUTORIAL

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------