Search code examples
pythonimageopencvimage-processingimage-recognition

Extract text inside rectangle from image


i have an image with multiple red rectangle image extraction and output is good.

i'm using https://github.com/autonise/CRAFT-Remade for text-recognition

original:

Original

my image:

IMAGE

i try to extract text only in all rectangle with pytesserac but without success. output result :

r

2

aseeaaei

ae

How we can extract text from this image correctly with accuracy?

part of code:

def saveResult(img_file, img, boxes, dirname='./result/', verticals=None, texts=None):
        """ save text detection result one by one
        Args:
            img_file (str): image file name
            img (array): raw image context
            boxes (array): array of result file
                Shape: [num_detections, 4] for BB output / [num_detections, 4] for QUAD output
        Return:
            None
        """
        img = np.array(img)

        # make result file list
        filename, file_ext = os.path.splitext(os.path.basename(img_file))

        # result directory
        res_file = dirname + "res_" + filename + '.txt'
        res_img_file = dirname + "res_" + filename + '.jpg'

        if not os.path.isdir(dirname):
            os.mkdir(dirname)

        with open(res_file, 'w') as f:
            for i, box in enumerate(boxes):
                poly = np.array(box).astype(np.int32).reshape((-1))
                strResult = ','.join([str(p) for p in poly]) + '\r\n'
                f.write(strResult)

                poly = poly.reshape(-1, 2)
                cv2.polylines(img, [poly.reshape((-1, 1, 2))], True, color=(0, 0, 255), thickness=2) # HERE
                ptColor = (0, 255, 255)
                if verticals is not None:
                    if verticals[i]:
                        ptColor = (255, 0, 0)

                if texts is not None:
                    font = cv2.FONT_HERSHEY_SIMPLEX
                    font_scale = 0.5
                    cv2.putText(img, "{}".format(texts[i]), (poly[0][0]+1, poly[0][1]+1), font, font_scale, (0, 0, 0), thickness=1)
                    cv2.putText(img, "{}".format(texts[i]), tuple(poly[0]), font, font_scale, (0, 255, 255), thickness=1)

        # Save result image
        cv2.imwrite(res_img_file, img)

after your comment, here's result:

Modified

and tesseract result good for first test but not accuracy :

400
300
200

“2615

1950



24
16

Solution

  • When using Pytesseract to extract text, preprocessing the image is extremely important. In general, we want to preprocess the text such that the desired text to extract is black with the background in white. To do this, we can use Otsu's threshold to obtain a binary image then perform morphological operations to filter and remove noise. Here's a pipeline:

    • Convert image to grayscale and resize
    • Otsu's threshold for binary image
    • Invert image and perform morphological operations
    • Find contours
    • Filter using contour approximation, aspect ratio, and contour area
    • Remove unwanted noise
    • Perform text recognition

    After converting to grayscale, we resize the image using imutils.resize() then Otsu's threshold for a binary image. The image is now in only black or white but there is still unwanted noise

    From here we invert the image and perform morphological operations with a horizontal kernel. This step merges the text into a single contour where we can filter and remove the unwanted lines and small blobs

    Now we find contours and filter using a combination of contour approximation, aspect ratio, and contour area to isolate the unwanted sections. The removed noise is highlighted in green

    Now that the noise is removed, we invert the image again to have the desired text in black then perform text extraction. I've also noticed that adding in a slight blur enhances recognition. Here's the cleaned image we perform text extraction on

    We give Pytesseract the --psm 6 configuration since we want to treat the image as a uniform block of text. Here's the result from Pytesseract

    6745 63 6 10.50
    2245 21 18 17
    525 4 22 0.18
    400 4 a 0.50
    300 3 4 0.75
    200 2 3 0.22
    2575 24 3 0.77
    1950 ii 12 133
    

    The output isn't perfect but its close. You can experiment with additional configuration settings here

    import cv2
    import pytesseract
    import imutils
    
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    
    # Resize, grayscale, Otsu's threshold
    image = cv2.imread('1.png')
    image = imutils.resize(image, width=500)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    
    # Invert image and perform morphological operations
    inverted = 255 - thresh
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,3))
    close = cv2.morphologyEx(inverted, cv2.MORPH_CLOSE, kernel, iterations=1)
    
    # Find contours and filter using aspect ratio and area
    cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        area = cv2.contourArea(c)
        peri = cv2.arcLength(c, True)
        approx = cv2.approxPolyDP(c, 0.01 * peri, True)
        x,y,w,h = cv2.boundingRect(approx)
        aspect_ratio = w / float(h)
        if (aspect_ratio >= 2.5 or area < 75):
            cv2.drawContours(thresh, [c], -1, (255,255,255), -1)
    
    # Blur and perform text extraction
    thresh = cv2.GaussianBlur(thresh, (3,3), 0)
    data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
    print(data)
    
    cv2.imshow('close', close)
    cv2.imshow('thresh', thresh)
    cv2.waitKey()