Search code examples
python-3.xtesseractpython-unicodepython-tesseract

Pytesseract: UnicodeDecodeError: 'charmap' codec can't decode byte


I'm running a large number of OCRs on screenshots with Pytesseract. This is working well in most cases, but a small number is causing this error:

pytesseract.image_to_string(image,None, False, "-psm 6")
Pytesseract: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2: character maps to <undefined>

I'm using Python 3.4. Any suggestions how I can prevent this error from happening (other than just a try/except) would be very helpful.


Solution

  • Use Unidecode

    from unidecode import unidecode
    import pytesseract
    
    strs = pytesseract.image_to_string(Image.open('binarized_image.png'))
    strs = unidecode(strs)
    print (strs)