Search code examples
pythonopencvocrtesseractpython-tesseract

Pytesseract not recognizing text even though it is visible in the picture


I have an image from which I want to extract text.

text_image

I am using following code to extract text.

pytesseract.image_to_string(text_image, config='-l eng --psm 7')

However, the output is wrong 80% of the time and it detects output like "mE Smart Meter Gateway" or "RTE Smart Meter Gateway". Mainly the issue is in the detection of the first two characters. I am using python3. Any help in improving the detection of the text will be appreciated.


Solution

  • After adaptiveThresholding, I was able to read the text. First blur the image.

    blurred = cv2.GaussianBlur(text_image, (7, 7), 0)

    Apply adaptivethresholding.

    thresh = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 21, 4)

    Finally, extract the text.

    text = pytesseract.image_to_string(thresh, config='-l eng --psm 7')