Search code examples
pythonocrpython-tesseract

Get numbers from cropped image pytesseract


I have a cropped image and I am trying to get the numbers on that cropped image Here's the code I am using

image = cv2.imread('Cropped.png')

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
invert = 255 - opening
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)

Here's the sample cropped image enter image description here

All what I got some numbers and not all of them. How to enhance such an image to be able to extract only the numbers?

I tried the code on this image but doesn't return correct numbers enter image description here


Solution

  • You can easily solve this with three-main steps



    Upsampling for accurate recognition. Otherwise tesseract may misterpret the digits.

    Threshold Displays only the features of the image.

    **Configuration Setting will recognize the digits


    Result
    Upsampling enter image description here
    Threshold enter image description here
    Pytesseract 277032200746

    Code:

    import cv2
    import pytesseract
    
    img1 = cv2.imread("kEpyN.png")  # "FX2in.png"
    gry1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
    (h, w) = gry1.shape[:2]
    gry1 = cv2.resize(gry1, (w*2, h*2))
    thr1 = cv2.threshold(gry1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    txt1 = pytesseract.image_to_string(thr1, config="digits")
    print("".join(t for t in txt1 if t.isalnum()))
    cv2.imshow("thr1", thr1)
    cv2.waitKey(0)
    

    Update:


    Most-probably a version mismatch causes extra words and digits.

    One-way to solving is taking a range of the image

    For instance, from the thresholded image:

    (h_thr, w_thr) = thr1.shape[:2]
    thr1 = thr1[0:h_thr-10, int(w_thr/2)-400:int(w_thr/2)+200]
    

    Result will be:

    enter image description here

    Now if you read, result should be like this output

    277032200746