Search code examples
pythontensorflowkerasdeep-learning

Issue with training Keras model using ModelCheckpoint in Kaggle notebook (Unexpected result of `train_function` (Empty logs))


I'm encountering an issue while trying to train a Keras model in a Kaggle notebook using TensorFlow's ModelCheckpoint callback. Here's my setup and the error I'm facing:

Setup:

I'm building a Keras model for multi-label classification using TensorFlow. Here are the relevant parts of my code:

from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, BatchNormalization
from keras.callbacks import TensorBoard
from keras.callbacks import ModelCheckpoint
from keras.optimizers import AdamW

epochs = 4
loss = tf.keras.losses.BinaryCrossentropy(from_logits=False)
classifier_model.compile(optimizer='adam',
                         loss=loss,
                         metrics = 'roc-auc')

print(f'Training model with {tfhub_handle_encoder}')
checkpoint_filepath = '/kaggle/working/tmp_weights.h5'

model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=True,
    monitor='val_loss',
    mode='min',
    save_best_only=True)

history = classifier_model.fit(x=train_ds,
                               validation_data=val_ds,
                               epochs=epochs,
                               callbacks = [model_checkpoint_callback])

Error:

Upon running the training script, I encounter the following error:

ValueError: Unexpected result of `train_function` (Empty logs). This could be due to issues in input pipeline that resulted in an empty dataset. Otherwise, please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.

Additional Context:

  • I'm using a TensorFlow Hub encoder (tfhub_handle_encoder) for text embeddings.
  • train_ds and val_ds are objects containing my training and validation data, respectively, and they are of this format: <_TakeDataset element_spec=(TensorSpec(shape=(None,), dtype=tf.string, name=None), TensorSpec(shape=(None, 6), dtype=tf.int64, name=None))>
  • I've verified that my data loading and preprocessing steps are correct and that train_ds and val_ds are not empty.

Request:

I would appreciate any insights or suggestions on how to resolve this issue with the ModelCheckpoint callback in my Keras training script on Kaggle. Thank you!


Solution

  • Check that your metrics and optimizers are contained within a list, as Keras wants them as such. Plus, it wants a ROC metric, as far as documentation goes.

    loss = tf.keras.losses.BinaryCrossentropy(from_logits=False)
    classifier_model.compile(optimizer=['adam'],
                             loss=loss,
                             metrics = ['ROC'])