Search code examples
pythonimagepython-tesseract

pytesseract can't recognice number 1


I'm running a script that gives me back the number and position of the numbers in a Numpad that is disorganized. But when it comes to recognising the 1 it gives me either 71 or 7.

This is the image where im extracting the 1 from.

This is the script I'm running

numero.save(r'C:\imagenes\numeros\numero.png')
image = Image.open(r'C:\imagenes\numeros\numero.png')
inverted_image = PIL.ImageOps.invert(image)
inverted_image.save(r'C:\imagenes\numeros\numero.png')

image = cv2.imread(r'C:\imagenes\numeros\numero.png')

numero = int(pytesseract.image_to_string(image, lang='spa', config='--psm 6 digits'))
print("numero :", numero)

if numero == 7 or numero not in numeros:
     numero_1_eng = int(pytesseract.image_to_string(image, lang='eng', config='--psm 6 digits'))
if numero_eng != 7:
     numero = 1
else:
     numero = numero_eng
print("numero:", numero)

vector = 930, 425, numero
vector_de_vectores.append(vector)

Solution

  • Solution


    1- Apply adaptive-thresholding

    2- Set tesseract configuration to --psm 7 (Since you are trying to recognize a single text line. See all psm modes)


    Result of adaptive-thresholding:

    enter image description here

    When you read:

    txt = pytesseract.image_to_string(thr, config="--psm 7")
    print(txt)
    

    Result:

    1
    

    Code:


    import cv2
    import pytesseract
    
    img = cv2.imread("tUh0U.png")
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    thr = cv2.adaptiveThreshold(gry, 252, cv2.ADAPTIVE_THRESH_MEAN_C,
                                cv2.THRESH_BINARY_INV, 31, 61)
    txt = pytesseract.image_to_string(thr, config="--psm 7")
    print(txt)