image_processed
variable is the attached image.
custom_config = r'--oem 3 --psm 7 -c tessedit_char_whitelist= 0123456789/'
result = pytesseract.image_to_string(image_processed, lang='eng', config=custom_config)
The output:
43659 [44 38
The application takes a screenshot of the screen and crops the specified coordinates with the numbers, then applies inverted threshold to get black numbers over a white board. I am trying to read the cropped numbers with pytesseract but it does not output reliable text outputs.
How to use pytesseract to read text from this image with simple numbers?
The main problems of recognizing numbers in this picture are solved by adding 10 pixels of white space above and below the image and setting a black border along the contour, as recommended in the documentation.
Solution using the Pillow
library.
import pytesseract
from PIL import Image, ImageOps
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image_processed = Image.open(r"jiyYb.jpg")
image_processed = ImageOps.expand(image_processed, border=10, fill='#ffffff')
image_processed = ImageOps.expand(image_processed, border=1)
custom_config = r'--psm 7 -c tessedit_char_whitelist=" /0123456789"'
result = pytesseract.image_to_string(image_processed, config=custom_config)
print(result)
-------------
5367 /5438
Solution using opencv-python
.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image_processed = cv2.imread('jiyYb.jpg')
image_processed = cv2.copyMakeBorder(src=image_processed, top=10, bottom=10, left=0, right=0,
borderType=cv2.BORDER_CONSTANT, value=[255, 255, 255])
image_processed = cv2.copyMakeBorder(src=image_processed, top=1, bottom=1, left=1, right=1,
borderType=cv2.BORDER_CONSTANT)
custom_config = r'--psm 7 -c tessedit_char_whitelist=" /0123456789"'
data = pytesseract.image_to_string(image_processed, config=custom_config)
print(data)
cv2.imshow('image_processed', image_processed)
cv2.waitKey(0)