Search code examples
pythontensorflowtensorboard

Controlling how long Tensorboard monitors training while fitting model over multiple datasets


I am training a model which is looping over several datasets

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, 
                                                        histogram_freq=1,
                                                        profile_batch=10)
for data in datasets:
    model.fit(data,callbacks=[tensorboard_callback])

I am trying to monitor the GPU usage over this dataset. However, Tensorboard is only able to collect data for a second or so. After that it stops. Also, it seems to suggest that the gpu usage during training is nearly perfect.

I have tried to play around with the arguments I pass to Tensorboard, but I don't feel close to reaching a solution. So, how does Tensorboard collect data?

Do I have to collect all the data into one list/dataframe before I can collect useful data with Tensorboard?


Solution

  • Add tensorboard_callback inside the for loop:

    for data in datasets:
    
        tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, 
                                                            histogram_freq=1,
                                                            profile_batch=10)
        model.fit(data,callbacks=[tensorboard_callback])