Search code examples
pythonocrtesseractpython-tesseract

Python Tesseract not recognising number in my image


I've got this picture (preprocessed image) from which I want to extract the numeric values of each line. I'm using pytesseract but it doesnt show any results for this image. I've tried several config options from other questions like "--psm 13 --oem 3" or whitelisting numbers but nothing yields results. As a result I usually get just one or two characters or ~5 dots/dashes but nothing even remotly resembling the size of my input.

I hope someone can help me cheers in advance for your time.

pytesseract version: 0.3.8 tesseract version: 5.0.0-alpha.20210506


Solution

  • You must think to use --psm 4, it's more appropriate for your image. I also recommend to rethink about the image pre-process. Tesseract is not perfect and it requires good image as input to work well.

    import cv2 as cv
    import pytesseract as tsr
    
    img = cv.imread('41DAx.jpg')
    img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
    
    
    config = '--psm 4 -c tessedit_char_whitelist=0123456789,'
    text = tsr.image_to_string(img, config=config)
    print(text)
    

    The above code was not able to well detect all digts in the image, but almost of them. Maybe with a bit of image pre-processing, you can reach your objective.