Search code examples
pythontesseract

Read text below barcode pytesseract python


I am trying to get the number below a barcode in an image. I have tried the same code with some other images and works fine but not for that image Here's the image enter image description here

And here is the code till now

def readNumber():
    image = cv2.imread(sTemp)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
    opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
    invert = 255 - opening
    data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6 -c tessedit_char_whitelist=0123456789')
    print(data)
    try:
        data  = re.findall('(\d{9})\D', data)[0]
    except:
        data = ''
    return data

And I used it using this line

readNumber()

Here's another example enter image description here

This is the last example I promise enter image description here

I tried this with the third example and it works

img = cv2.imread("thisimage.png")
blur = cv2.GaussianBlur(img, (3,3), 0)
#gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(blur)
print(txt)

But how I adopt all the cases to work with the three cases? I tried such a code but couldn't implement the thrid case

import pytesseract, cv2, re

def readNumber(img):
    img = cv2.imread(img)
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    try:
        txt = pytesseract.image_to_string(gry)
        #txt  = re.findall('(\d{9})\D', txt)[0]
    except:
        thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
        txt = pytesseract.image_to_string(thr, config="digits")
        #txt  = re.findall('(\d{9})\D', txt)[0]

    return txt

# M5Pr5         191876320
# RWgrP         202131290
# 6pVH4         193832560
print(readNumber('M5Pr5.png'))

Solution

  • You don't need any preprocessing methods or configuration for the input image. Since there is no artifacts in the image.

    import cv2
    import pytesseract
    
    img = cv2.imread("RWgrP.png")
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    txt = pytesseract.image_to_string(gry)
    print(txt)
    

    Result:

    202131290
    

    My pytesseract version is 4.1.1

    Update-1


    The second image requires preprocessing

    If you apply adaptive-thresholding:

    enter image description here

    But the output also consists of unwanted characters. Therefore if you set the configuration to digits, the result will be:

    193832560
    

    Update-2


    For the third image, you need to change the adaptive method, using ADAPTIVE_THRESH_MEAN_C will result in:

    191876320
    

    The rest are same.

    Code:

    import cv2
    import pytesseract
    
    img = cv2.imread("6pVH4.png")
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
    txt = pytesseract.image_to_string(thr, config="digits")
    print(txt)
    cv2.imshow("thr", thr)
    cv2.waitKey(0)