UnidentifiedImageError: cannot identify image file

Hello I am training a model with TensorFlow and Keras, and the dataset was downloaded from https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765

This is a zip folder that I split in the following directories:

.
├── test
│   ├── Cat
│   └── Dog
└── train
    ├── Cat
    └── Dog

Test.cat and test.dog have each folder 1000 jpg photos, and train.cat and traing.dog have each folder 11500 jpg photos.

The load is doing with this code:

batch_size = 16

# Data augmentation and preprocess
train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.20) # set validation split

# Train dataset
train_generator = train_datagen.flow_from_directory(
    'PetImages/train',
    target_size=(244, 244),
    batch_size=batch_size,
    class_mode='binary',
    subset='training') # set as training data

# Validation dataset
validation_generator = train_datagen.flow_from_directory(
    'PetImages/train',
    target_size=(244, 244),
    batch_size=batch_size,
    class_mode='binary',
    subset='validation') # set as validation data

test_datagen = ImageDataGenerator(rescale=1./255)
# Test dataset
test_datagen = test_datagen.flow_from_directory(
    'PetImages/test')

THe model is training with the following code:

history = model.fit(train_generator,
                    validation_data=validation_generator,
                    epochs=5)

And i get the following input:

Epoch 1/5
1150/1150 [==============================] - ETA: 0s - loss: 0.0505 - accuracy: 0.9906

But when the epoch is in this point I get the following error:

UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f9e185347d0>

How can I solve this, in order to finish the training?

Thanks

Solution

Try this function to check if the image are all in correct format.

import os
from PIL import Image
folder_path = 'data\img'
extensions = []
for fldr in os.listdir(folder_path):
    sub_folder_path = os.path.join(folder_path, fldr)
    for filee in os.listdir(sub_folder_path):
        file_path = os.path.join(sub_folder_path, filee)
        print('** Path: {}  **'.format(file_path), end="\r", flush=True)
        im = Image.open(file_path)
        rgb_im = im.convert('RGB')
        if filee.split('.')[1] not in extensions:
            extensions.append(filee.split('.')[1])