python ocr python-tesseract

How to use pytesseract to read text from this image with simple numbers?

image_processed variable is the attached image.

    custom_config = r'--oem 3 --psm 7 -c tessedit_char_whitelist= 0123456789/'
    result = pytesseract.image_to_string(image_processed, lang='eng', config=custom_config)

The output:

43659 [44 38

The application takes a screenshot of the screen and crops the specified coordinates with the numbers, then applies inverted threshold to get black numbers over a white board. I am trying to read the cropped numbers with pytesseract but it does not output reliable text outputs.

How to use pytesseract to read text from this image with simple numbers?

Solution

The main problems of recognizing numbers in this picture are solved by adding 10 pixels of white space above and below the image and setting a black border along the contour, as recommended in the documentation.

Solution using the Pillow library.

import pytesseract
from PIL import Image, ImageOps

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image_processed = Image.open(r"jiyYb.jpg")

image_processed = ImageOps.expand(image_processed, border=10, fill='#ffffff')
image_processed = ImageOps.expand(image_processed, border=1)

custom_config = r'--psm 7 -c tessedit_char_whitelist=" /0123456789"'
result = pytesseract.image_to_string(image_processed, config=custom_config)
print(result)

-------------

5367 /5438

Solution using opencv-python.

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image_processed = cv2.imread('jiyYb.jpg')

image_processed = cv2.copyMakeBorder(src=image_processed, top=10, bottom=10, left=0, right=0,
                                     borderType=cv2.BORDER_CONSTANT, value=[255, 255, 255])
image_processed = cv2.copyMakeBorder(src=image_processed, top=1, bottom=1, left=1, right=1,
                                     borderType=cv2.BORDER_CONSTANT)

custom_config = r'--psm 7 -c tessedit_char_whitelist=" /0123456789"'
data = pytesseract.image_to_string(image_processed, config=custom_config)
print(data)

cv2.imshow('image_processed', image_processed)
cv2.waitKey(0)