Search code examples
pythonimage-preprocessing

python imaging library truncated image problem


I'm working with PIL in python to load and resize a large number of images, to feed to a CNN. But during the process of loading this error happens:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-1-9e7a5298cd3e> in <module>
      3 dog_names = ip.labels("dogImages/train")
      4 
----> 5 trn_data, trn_targets = ip.data_loader('dogImages/train', (224, 224))
      6 val_data, val_targets = ip.data_loader('dogImages/valid', (224, 224))
      7 tst_data, tst_targets = ip.data_loader('dogImages/test', (224, 224))

...my address...\libs\img_preprocessing.py in data_loader(path, size)
     48             cat_target.append([1 if pre_label(im)==label else 0 for label in labels(total)])
     49             img = Image.open(im)
---> 50             img = Image.Image.resize(img, size=size)
     51             img = np.array(img)
     52             arr.append(img)

C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py in resize(self, size, resample, box, reducing_gap)
   1922             return im.convert(self.mode)
   1923 
-> 1924         self.load()
   1925 
   1926         if reducing_gap is not None and resample != NEAREST:

C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImageFile.py in load(self)
    247                                     break
    248                                 else:
--> 249                                     raise OSError(
    250                                         "image file is truncated "
    251                                         f"({len(b)} bytes not processed)"

OSError: image file is truncated (150 bytes not processed)

I've seen some suggestions about adding this code:

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = False

But I think it allows defective data to enter our model. I don't want that. I want to skip corrupted images without crashing the program, and load all the rest of the images, but i can't figure it out. The code I use is this:

def data_loader(path, size):
    '''
    loading image data
    parameters:
        path => image directory path
        size => output size in tuple
    '''
    total = glob(path + "/*")
    arr = []
    for _dir in total:
        for im in glob(_dir+"/*"):
            img = Image.open(im)
            img = Image.Image.resize(img, size=size)
            img = np.array(img)
            arr.append(img)
    return np.array(arr)

Solution

  • Since the error is occurring when you attempt to resize, enclose that line in a try/except. When you get the error, continue skips the rest of the current iteration and continues on with the next image file.

    from glob import glob
    
    import numpy as np
    from PIL import Image
    
    def load_data(path, size):
        '''
        loading image data
        parameters:
            path => image directory path
            size => output size in tuple
        '''
        total = glob(path + "/*")
        images = []
        for subdir in total:
            for im in glob(subddir + "/*"):
                img = Image.open(im)
                try:
                    img = img.resize(size)
                except OSError:
                    continue
                img = np.array(img)
                images.append(img)
        return np.array(images)
    

    Some other minor things I changed:

    • data_loader sounds more like a class than a function. I recommend verbs for functions, or at least not nouns that sound like they perform actions.
    • As a variable name, arr is both generic (what's in it?) and misleading (it's a list, not an array).
    • Variables starting in _ are, by convention, usually used for "private" attributes.
    • img.resize(size) is just a simpler way of calling the resize method.