Hello I am training a model with TensorFlow and Keras, and the dataset was downloaded from https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765
This is a zip folder that I split in the following directories:
.
├── test
│ ├── Cat
│ └── Dog
└── train
├── Cat
└── Dog
Test.cat and test.dog have each folder 1000 jpg photos, and train.cat and traing.dog have each folder 11500 jpg photos.
The load is doing with this code:
batch_size = 16
# Data augmentation and preprocess
train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.20) # set validation split
# Train dataset
train_generator = train_datagen.flow_from_directory(
'PetImages/train',
target_size=(244, 244),
batch_size=batch_size,
class_mode='binary',
subset='training') # set as training data
# Validation dataset
validation_generator = train_datagen.flow_from_directory(
'PetImages/train',
target_size=(244, 244),
batch_size=batch_size,
class_mode='binary',
subset='validation') # set as validation data
test_datagen = ImageDataGenerator(rescale=1./255)
# Test dataset
test_datagen = test_datagen.flow_from_directory(
'PetImages/test')
THe model is training with the following code:
history = model.fit(train_generator,
validation_data=validation_generator,
epochs=5)
And i get the following input:
Epoch 1/5
1150/1150 [==============================] - ETA: 0s - loss: 0.0505 - accuracy: 0.9906
But when the epoch is in this point I get the following error:
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f9e185347d0>
How can I solve this, in order to finish the training?
Thanks
Try this function to check if the image are all in correct format.
import os
from PIL import Image
folder_path = 'data\img'
extensions = []
for fldr in os.listdir(folder_path):
sub_folder_path = os.path.join(folder_path, fldr)
for filee in os.listdir(sub_folder_path):
file_path = os.path.join(sub_folder_path, filee)
print('** Path: {} **'.format(file_path), end="\r", flush=True)
im = Image.open(file_path)
rgb_im = im.convert('RGB')
if filee.split('.')[1] not in extensions:
extensions.append(filee.split('.')[1])