Search code examples
tensorflowloggingcallbacktensorboardhistory

How can I fill out filepath argument of ModelCheckPoint class in tf.keras? (having trouble understanding epoch values and logs concept)


I found out that in a filepath argument of ModelCheckPoint I can save checkpoints with the epoch value and log keys. However, as I'm a newbie I'm having a hard time understanding the concepts of logs and how tensorboard uses it. All I know is that logs are saved data telling what events happened in tensorflow (am I right?)

  1. Then if my file path has 'weights.{epoch:02d}-{val_loss:.4f}.hdf5' this kinda format, is '.4f' a log of val_loss? and is 02d also a log of epoch or is it an epoch value?
  2. and also how can I find '02d','.4f'(=logs and epoch value) information ? I assume that I might need these values to define filepath
  3. my final question is what will be different from the above format, if I just set my file path as 'weights.hdf5'

Thanks in advance to those who will gonna answer questions ! (You are an angel)


Solution

    1. The .4f and .02d are for string formatting. Specifically {epoch:.02d} means "Insert the epoch number, which is an integer (d part), with width at least 2 characters and if necessary use leading zeros (eg. epoch is 1, then this will output 01)". The {val_loss:.4f} means "Insert the val_loss, which is a float (f part), with 4 numbers after the radix point". So the output will contain the current epoch and val_loss values formatted a certain way.
    2. As these are for string formatting, you do not need to know these, they are automatic when inserting the values into the string. Also, epoch and val_loss will get the correct values from ModelCheckPoint.
    3. The advantage of using the first format is that your checkpoints will not overwrite each other at every epoch (because the current epoch number is in the filepath). Instead, there will be multiple checkpoints in the folder and you can use whichever you like later for testing or finetuning etc.