Search code examples
python-3.xpython-tesseracttext-extractionamazon-textract

How to extract text from an image with a variety of noisy texts and numbers?


I have an image here:

metre

I need to extract the meter readings from this image which is "0005053" at the centre.

I have tried pytesseract as follows:

import pytesseract
from PIL import Image
text = pytesseract.image_to_string(Image.open("Screen_Shot_2564-08-25_at_11.23.13.png"))
print(text)

The output I got was ' \n\x0c'.

Another service I found was AWS Textract which is extremely accurate but I couldn't find a workaround to implement that in python. Any leads there will be appreciated.

Any idea on how to resolve this?

Thanks


Solution

  • If you want to go the textract route you can easily implement it using boto3. You can also try using AWS Rekognition to do the same task and see whether it is more accurate/cheaper