I'm working with PIL in python to load and resize a large number of images, to feed to a CNN. But during the process of loading this error happens:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-1-9e7a5298cd3e> in <module>
3 dog_names = ip.labels("dogImages/train")
4
----> 5 trn_data, trn_targets = ip.data_loader('dogImages/train', (224, 224))
6 val_data, val_targets = ip.data_loader('dogImages/valid', (224, 224))
7 tst_data, tst_targets = ip.data_loader('dogImages/test', (224, 224))
...my address...\libs\img_preprocessing.py in data_loader(path, size)
48 cat_target.append([1 if pre_label(im)==label else 0 for label in labels(total)])
49 img = Image.open(im)
---> 50 img = Image.Image.resize(img, size=size)
51 img = np.array(img)
52 arr.append(img)
C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py in resize(self, size, resample, box, reducing_gap)
1922 return im.convert(self.mode)
1923
-> 1924 self.load()
1925
1926 if reducing_gap is not None and resample != NEAREST:
C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImageFile.py in load(self)
247 break
248 else:
--> 249 raise OSError(
250 "image file is truncated "
251 f"({len(b)} bytes not processed)"
OSError: image file is truncated (150 bytes not processed)
I've seen some suggestions about adding this code:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = False
But I think it allows defective data to enter our model. I don't want that. I want to skip corrupted images without crashing the program, and load all the rest of the images, but i can't figure it out. The code I use is this:
def data_loader(path, size):
'''
loading image data
parameters:
path => image directory path
size => output size in tuple
'''
total = glob(path + "/*")
arr = []
for _dir in total:
for im in glob(_dir+"/*"):
img = Image.open(im)
img = Image.Image.resize(img, size=size)
img = np.array(img)
arr.append(img)
return np.array(arr)
Since the error is occurring when you attempt to resize, enclose that line in a try
/except
. When you get the error, continue
skips the rest of the current iteration and continues on with the next image file.
from glob import glob
import numpy as np
from PIL import Image
def load_data(path, size):
'''
loading image data
parameters:
path => image directory path
size => output size in tuple
'''
total = glob(path + "/*")
images = []
for subdir in total:
for im in glob(subddir + "/*"):
img = Image.open(im)
try:
img = img.resize(size)
except OSError:
continue
img = np.array(img)
images.append(img)
return np.array(images)
Some other minor things I changed:
data_loader
sounds more like a class than a function. I recommend verbs for functions, or at least not nouns that sound like they perform actions.arr
is both generic (what's in it?) and misleading (it's a list, not an array)._
are, by convention, usually used for "private" attributes.img.resize(size)
is just a simpler way of calling the resize
method.