Search code examples
tensorflowgoogle-cloud-storagegoogle-cloud-build

Keras checkpoints not being saved to google cloud bucket


I'm using the following code to save checkpoints while a google cloud build runs my model:

 cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath = "gs://mybucket/checkpoints", 
                                                   verbose=0,
                                                   save_weights_only=True,
                                                   monitor='val_loss',
                                                   mode='min',
                                                   save_best_only=True)

I'm getting no errors in my build logs, but the only thing in the bucket after each run is a tf_cloud_train_tar file containing the source directory contents.

I'm using callbacks = [cp_callback] in model.fit.


Solution

  • I was having this problem for several reasons:

    • Dataset was not on the storage bucket, and so the code had no access to it.
    • Use of generator for dataset without files creates an infinite loop, but no crash.

    I switched to AI Platform and sourced my data from the GCS Bucket and the problem was fixed.