python image-recognition text-extraction python-tesseract

How to extract text from image using pytesseract?

I'm using pytesseract to try extract text numbers from image.

I'm trying to extract the three numbers from this picture.

A straightforward method using pytesseract is:

from PIL import Image
from pytesseract import pytesseract
text = pytesseract.image_to_string(Image.open("uploaded_image.png"))
print(text)

But this prints blank.

Why can't it extract the numbers as it can for normal usual text ?

Solution

Your images need some preprocessing in order to be efficiently processed by pytesseract.

The following shows this process using cv2.adaptiveThreshold(), cv2.findContours(), cv2.drawContours() operations before converting image to black and white and invert it:

import numpy as np
import cv2
from PIL import Image
import pytesseract

img = cv2.imread('uploaded_image.png', cv2.IMREAD_COLOR)
img = cv2.blur(img, (5, 5))

#HSV (hue, saturation, value)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h, s, v = cv2.split(hsv)

#Applying threshold on pixels' Value (or Brightness)
thresh = cv2.adaptiveThreshold(v, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)

#Finding contours
contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

#Filling contours
contours = cv2.drawContours(img,np.array(contours),-1,(255,255,255),-1)

#To black and white
grayImage = cv2.cvtColor(contours, cv2.COLOR_BGR2GRAY)

#And inverting it
#Setting all `dark` pixels to white
grayImage[grayImage > 200] = 0
#Setting relatively clearer pixels to black
grayImage[grayImage < 100] = 255
#Write the temp file
cv2.imwrite('temp.png',grayImage)

#Read it with tesseract
text = pytesseract.image_to_string(Image.open('temp.png'),config='tessedit_char_whitelist=0123456789 -psm 6 ')

#Output
print("####  Raw text ####")
print(text)
print()
print("#### Extracted digits ####")
print([''.join([y for y in x if y.isdigit()]) for x in text.split('\n')])

Output

####  Raw text ####
93
31
92

#### Extracted digits ####
['93', '31', '92']

Processed image :

EDIT

Updated answer using cv2 library and getting all the digits from image