I've two images, read in with opencv and trying to recognise the numbers inside with pytesseract. One of the image gets correct numbers detected. Other one detects no numbers at all. Both images are cropped screenshots from same phone, taken from same app. So fonts and aligenments would be same. Below is the code I've used for this purpose.
import cv2
import pytesseract
import os
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
xconfig='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
plist = [x for x in os.listdir() if x.endswith(".png")]
for pt in plist:
img = cv2.imread(pt)
pytesseract.image_to_string(img,config=xconfig)
This is the first image, the numbers get detected here correctly
The numbers don't get detected in the one below.
In the above one the following characters get detected, if we use without any custom config: 'lO R Ly Reb yL\n\x0c'
You should take the threshold of the image:
thr = cv2.threshold(src=gry, thresh=0, maxval=255, type=cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1]
Now read
txt = pytesseract.image_to_string(thr)
print(txt)
Result:
662,157,015,578
Code:
import cv2
import pytesseract
img = cv2.imread("v0cUq.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(src=gry, thresh=0, maxval=255, type=cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1]
txt = pytesseract.image_to_string(thr)
print(txt)