Search code examples
python-3.xpython-tesseract

PyTesseract Error for Multi Page Tiff Image


When I read in a multi page Tiff Image which is 15 pages and is a document in black letters/words in white background ,PyTesseract throws an "OSError: -9" error at the step where I loop over the pages and convert to string.

I use the pytesseract package along with pyocr.builders. The single page seem to work fine but I believe the error when the image is not in RGB the program converts to RGB.

img = Image.open(r'\users\ai\text.tiff')
img.load()
txt = ""
for frame in range(0, img.n_frames):
    img.seek(frame)
    txt += tool.image_to_string(img,builder=pyocr.builders.TextBuilder())

Expected output is all 15 pages shown in jupyter window.

Error Message

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-17-e59bdf3b773c> in <module>
      2 for frame in range(0, img.n_frames):
      3     img.seek(frame)
----> 4     txt += tool.image_to_string(img,builder=pyocr.builders.TextBuilder())
      5 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyocr\tesseract.py in image_to_string(image, lang, builder)
    357     with tempfile.TemporaryDirectory() as tmpdir:
    358         if image.mode != "RGB":
--> 359             image = image.convert("RGB")
    360         image.save(os.path.join(tmpdir, "input.bmp"))
    361         (status, errors) = run_tesseract("input.bmp", "output", cwd=tmpdir,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\PIL\Image.py in convert(self, mode, matrix, dither, palette, colors)
    932         """
    933 
--> 934         self.load()
    935 
    936         if not mode and self.mode == "P":

~\AppData\Local\Continuum\anaconda3\lib\site-packages\PIL\TiffImagePlugin.py in load(self)
   1097     def load(self):
   1098         if self.use_load_libtiff:
-> 1099             return self._load_libtiff()
   1100         return super(TiffImageFile, self).load()
   1101 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\PIL\TiffImagePlugin.py in _load_libtiff(self)
   1189 
   1190         if err < 0:
-> 1191             raise IOError(err)
   1192 
   1193         return Image.Image.load(self)

OSError: -9

Solution

  • For a question like this, you should supply a Minimum Reproducible Example as there is some code left out. Also, you should provide your test image. For this example, though, you cannot attach a multi-page TIFF, so a link to one would be good.

    I was able to find this test image from this question. It's a 10 page TIFF.

    Here's a solution using pyocr:

    from PIL import Image
    
    import pytesseract
    import pyocr
    import pyocr.builders
    
    tools = pyocr.get_available_tools()
    tool = tools[0]
    
    # pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
    
    
    image = Image.open('multipage_tiff_example.tif')
    
    # set Page Segmentation Mode to 6 
    # (i.e. assume a single uniform block of text)
    config = ("--psm 6")
    
    txt = ''
    for frame in range(image.n_frames):
        image.seek(frame)
        txt = tool.image_to_string(image, builder=pyocr.builders.TextBuilder())
        print(txt)
    

    And here's a solution using pytesseract:

    from PIL import Image
    import pytesseract
    
    # pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
    
    image = Image.open('multipage_tiff_example.tif')
    
    # set Page Segmentation Mode to 6 
    # (i.e. assume a single uniform block of text)
    config = ("--psm 6")
    
    txt = ''
    for frame in range(image.n_frames):
        image.seek(frame)
        txt += pytesseract.image_to_string(image, config = config, lang='eng') + '\n'
        
    print(txt)
    

    both give this output:

    Multipage
    TIFF
    Example
    Page 1
    Multipage
    TIFF
    Example
    Page 2
    Multipage
    TIFF
    Example
    Page 3
    Multipage
    TIFF
    Example
    Page 4
    Multipage
    TIFF
    Example
    Page5
    Multipage
    TIFF
    Example
    Page 6
    Multipage
    TIFF
    Example
    Page /
    Multipage
    TIFF
    Example
    Page 8
    Multipage
    TIFF
    Example
    Page 9
    Multipage
    TIFF
    
    Example
    
    Page 10