Search code examples
python-3.xpython-tesseracttext-extraction

Python pytesseract extract number from various images


I have various type of images like those:

enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here

As you see, they are all kinda similar, however I do not manage to properly extract the number on them.

So far my code consists in the following:

    lower = np.array([250,200,90], dtype="uint8")
    upper = np.array([255,204,99], dtype="uint8")

    mask = cv2.inRange(img, lower, upper)
    res = cv2.bitwise_and(img, img, mask=mask)

    data = image_to_string(res, lang="eng", config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
    numbers = int(''.join(re.findall(r'\d+', data)))

I tried twearking the psm parameter 6,8 and 13 they all work for some of those examples, but none on all, and I have no idea how I could circumvent my problem.

Another solution proposed is:

gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
erd = cv2.erode(gry, None, iterations=1)
thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
bnt = cv2.bitwise_not(thr)

However, on the first picture, bnt gives:

enter image description here

And then pytesseract sees 460..

Any idea please?


Solution

  • My approach:



    Upsampling is required for accurate recognition. Resizing two-times will make the image readable.

    Erosion operation is a morphological operation helps to remove the boundary of the pixels. Erosion remove the strokes on the digit, make it easier to detect.

    Thresholding (Binary and Inverse Binary) helps to reveal the features.

    Bitwise-not is an arithmetic operation highly useful for extracting part of the image.

    You can learn more methods simple reading from Improving the quality of the output


    Erosion Threshold Bitwise-not
    enter image description here enter image description here enter image description here
    enter image description here enter image description here enter image description here
    enter image description here enter image description here enter image description here
    enter image description here enter image description here enter image description here
    enter image description here enter image description here enter image description here

    Update


    The first image is easy to read, since it is not requiring any pre-processing technique. Please read How to Improve Quality of Tesseract

    Result:

    1460
    720
    3250
    3146
    2681
    1470
    

    Code:

    import cv2
    import pytesseract
    
    img_lst = ["oqWjd.png", "YZDt1.png", "MUShJ.png", "kbK4m.png", "POIK2.png", "4W3R4.png"]
    
    for i, img_nm in enumerate(img_lst):
        img = cv2.imread(img_nm)
        gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        (h, w) = gry.shape[:2]
        if i == 0:
            thr = gry
        else:
            gry = cv2.resize(gry, (w * 2, h * 2))
            erd = cv2.erode(gry, None, iterations=1)
            if i == len(img_lst)-1:
                thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
            else:
                thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
        bnt = cv2.bitwise_not(thr)
        txt = pytesseract.image_to_string(bnt, config="--psm 6 digits")
        print("".join([t for t in txt if t.isalnum()]))
        cv2.imshow("bnt", bnt)
        cv2.waitKey(0)
    
    

    If you want to display comma in the result, change print("".join([t for t in txt if t.isalnum()])) line to print(txt).

    Not that on the fourth image the threshold method changed from binary to inverse-binary. Binary thresholding is not working accurately on all images. Therefore you need to change.