Search code examples
pythonocrtesseractpython-tesseract

Tesseract OCR fails on TIFF files


I have a multiple page .tif file, I am trying to extract text from it using Tesseract OCR but I am getting this error

TypeError: Unsupported image object

Code

from PIL import Image
import pytesseract

img = Image.open('Group 1/1_CHE_MDC_1.tif')
text = pytesseract.image_to_string(img.seek(0))  # OCR on 1st Page
text = ' '.join(text.split())
print(text)

ERROR

enter image description here

Any idea why its happening


Solution

  • Image.seek does not have a return value so you're essentially running:

    pytesseract.image_to_string(None)
    

    Instead do:

    img.seek(0)
    text = pytesseract.image_to_string(img)