python tensorflow keras virtual-machine image-classification

TensorFlow Keras: Input is empty. [[{{node decode_image/DecodeImage}}]] [[IteratorGetNext]] [Op:__inference_train_function_2877]

I am using a modified version of the TensorFlow Image Classification tutorial found at this link. I will attach the code that I have at the bottom of the post.

I am trying to use this model to classify images on a much larger dataset that has pictures of shapes. This dataset is ~23 times the size of the original one in the tutorial, which therefore takes much more computing power to train the model. In order to not hurt my poor, little laptop, I moved the job over to a Google Compute Engine Virtual Machine (8 cores, 32GB of RAM).

The model that I have attached below runs through all of the preliminary steps (importing the dataset, structuring the model, etc.). After all of these steps, it begins the training sequence. This seems like all is fine and well...

Epoch 1/20
200/304 [==================>...........] - ETA: 5:23 - loss: 2.1112 - accuracy: 0.1773

However, after about 60-90% of the way through the first epoch, it throws the following exception:

224/304 [=====================>........] - ETA: 4:09 - loss: 2.1010 - accuracy: 0.18202023-06-29 07:34:04.667705: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: Input is empty.
         [[{{node decode_image/DecodeImage}}]]
         [[IteratorGetNext]]
Traceback (most recent call last):
  File "/MOUNT_HD1/gschindl/code/GeoShapeFull.py", line 215, in <module>
    history = drop_model.fit(
  File "/home/gschindl/.local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/gschindl/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Input is empty.
         [[{{node decode_image/DecodeImage}}]]
         [[IteratorGetNext]] [Op:__inference_train_function_2877]

This is a strange error to me because there seems to be no issue starting the training process and there doesn't seem to be a set spot in the first epoch where the training errors out. One difference that I noted (and I believe I addressed) is that the image files are .png in this dataset compared to the .jpg in the original dataset.

------------------------------

As promised, the dataset file structure and code:

Dataset File Structure:

|
|-new_2d_shapes
   |-Square
   |    |-Square_562aecd2-2a86-11ea-8123-8363a7ec19e6.png
   |    |-Square_a9df2a7c-2a96-11ea-8123-8363a7ec19e6.png
   |    |-....
   |-Triangle
   |     |-Triangle_5624fb26-2a89-11ea-8123-8363a7ec19e6.png
   |     |-Triangle_56dd1ee8-2a8d-11ee-8123-8363a7ec19e6.png
   |     |-....
   |-Pentagon
   |    |-Pentagon_aa06095a-2a85-11ea-8123-8363a7ec19e6.png
   |    |-Pentagon_a9fca126-2a94-11ea-8123-8363a7ec19e6.png
   |    |-....
   |-Hexagon
        |-Hexagon_ffff21c6-2a8e-11ea-8123-8363a7ec19e6.png
        |-Hexagon_a9eb022a-2a8c-11ea-8123-8363a7ec19e6.png
        |-....

Code:

(Notice that I commented out the portion of code responsible for configuring the dataset for performance because I thought that might be an issue. The visualization is also commented out because I am working over SSH connection)

# %%
# Running all of the imported packages
import sklearn
import matplotlib.pyplot as plt
import numpy as np
import PIL
# Notice that this import takes a while
# This is amplified if using a virtual environment
print("Beginning to import tensorflow...")
import tensorflow as tf
print("tensorflow has been imported.")

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

import pathlib


# %%
# Used for importing the dataset off of the web
# dataset_url = "https://data.mendeley.com/datasets/wzr2yv7r53/1"

# print("Stuck1")

# # Should print "data_dir: C:\Users\Garrett\.keras\datasets\flower_photos.tar"
# data_dir = tf.keras.utils.get_file('2D_geo_shape.tar', origin=dataset_url, extract=True)
# print("data_dir: {}".format(data_dir))


data_dir = "/MOUNT_HD1/gschindl/datasets/new_2d_shapes"

# Should print "data_dir: C:\Users\Garrett\.keras\datasets\flower_photos"
data_dir = pathlib.Path(data_dir).with_suffix('')
print("data_dir: {}".format(data_dir))

image_data = list(data_dir.glob('*/*.png'))
image_count = len(list(data_dir.glob('*/*.png')))
print("Number of images found: {}".format(image_count))


# %%
# Sets parameters for the loader
batch_size = 288
img_height = 180
img_width = 180

# %%
# Beginning the splitting and Finding the class names from the training set
# It's good practice to use a validation split when developing your model. 
# Use 80% of the images for training and 20% for validation.
print("Beginning the splitting and Finding the class names from the training set")

train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

class_names = train_ds.class_names
print(class_names)



## %%
## Configuring the dataset for performance
#AUTOTUNE = tf.data.AUTOTUNE

#train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
#val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

#print("Configured.")


# %%
# Standardizing the data
print("\nStandardizing the data")
# Changing the RGB range from [0, 255] to [0, 1] by using tf.keras.layers.Rescaling
normalization_layer = layers.Rescaling(1./255)

normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixel values are now in `[0,1]`.
print("\n\nTHE NEW PIXEL VALUES",np.min(first_image), np.max(first_image))
print("Actual image: ", first_image)


# %%
# Creating the model
print("\nCreating the model")
num_classes = len(class_names)

model = Sequential([
  layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])

print("\n\nCompleted the model creation process, onto compiling the model")



# %%
# Compiling the Model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])


# %%
# Printing the model summary
model.summary()




# %%
# Data augmentation; "creating" more samples to train model on
print("\nBeginning the data augmentation task")

data_augmentation = keras.Sequential(
  [
    layers.RandomFlip("horizontal",
                      input_shape=(img_height,
                                  img_width,
                                  3)),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
  ]
)
    


# %%
# Visualizing the data augmentation

#plt.figure(figsize=(10, 10))
#for images, _ in train_ds.take(1):
#  for i in range(9):
#    augmented_images = data_augmentation(images)
#    ax = plt.subplot(3, 3, i + 1)
#   plt.imshow(augmented_images[0].numpy().astype("uint8"))
#    plt.axis("off")


# %% 
# Adding in Dropout to a new model "drop_model"
print("\nAdding the dropout to the new 'drop_model' object")

drop_model = Sequential([
  data_augmentation,
  layers.Rescaling(1./255),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes, name="outputs")
])


# %%
# Compiling the drop_model network and training it
print("\nCompiling the drop_model network")

drop_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
     

drop_model.summary()
     

print("\n\nBeginning the training on drop_model\n")
epochs = 20
history = drop_model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs,
  steps_per_epoch = image_count // batch_size 
)

Solution

ANSWER: The Autotuning portion of the code that was commented out must stay commented out. If you don't, the memory that the process requests grows astronomically.

The two changes that I made:

Converting all of the photos from .png format to jpg format. I did this by using the mogrify package. More information about these file conversions is listed here.

  mogrify -format jpg *.png

The second item is removing the very last line of the .fit setup -- steps_per_epoch = image_count // batch_size. I saw that this was an issue when image_count was not divisible by batch_size. You can remove this line without any harm because the .fit automatically will calculate the correct amount of steps that should be taken per epoch.

(I was able to get it to run the full first training with an accuracy of 10%!!!)