I am working on a image classification Kaggle competition and download some training images from Kaggle.com. Then I am using transfer learning with ResNet50 to work on these images, within Keras 2.0 and Tensorflow as background (and Python 3).
However, 258 out the total 1281 train images are having 'Possibly corrupt EXIF data' and been ignored when loaded to the ResNet model, very likely due to a Pillow issue.
The output messages are like:
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 524288 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 393216 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 33554432 bytes but only got 0. Skipping tag 4
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 25165824 bytes but only got 0. Skipping tag 4
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 131072 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
(more to come ...)
Based on the output messages, I only know they are there, but don't know which ones they are...
My question is: how can I identify these 258 images so that I can manually remove them out of the data set?
The easiest way that comes to mind is to modify your code to handle one image at a time, then iterate over each images and check which one generates the warning.