Search code examples
opencvocrpython-tesseract

Pytesseract doesn't detect number in a image


I've two images, read in with opencv and trying to recognise the numbers inside with pytesseract. One of the image gets correct numbers detected. Other one detects no numbers at all. Both images are cropped screenshots from same phone, taken from same app. So fonts and aligenments would be same. Below is the code I've used for this purpose.

import cv2
import pytesseract
import os

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
xconfig='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'

plist = [x for x in os.listdir() if x.endswith(".png")]
for pt in plist:
    img = cv2.imread(pt)
    pytesseract.image_to_string(img,config=xconfig)

This is the first image, the numbers get detected here correctly cropped1

The numbers don't get detected in the one below. cropped2

In the above one the following characters get detected, if we use without any custom config: 'lO R Ly Reb yL\n\x0c'


Solution

  • You should take the threshold of the image:

    thr = cv2.threshold(src=gry, thresh=0, maxval=255, type=cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1]
    

    enter image description here

    Now read

    txt = pytesseract.image_to_string(thr)
    print(txt)
    

    Result:

    662,157,015,578
    

    Code:

    import cv2
    import pytesseract
    
    img = cv2.imread("v0cUq.png")
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    thr = cv2.threshold(src=gry, thresh=0, maxval=255, type=cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1]
    txt = pytesseract.image_to_string(thr)
    print(txt)