Search code examples
pythontensorflowtensorboard

Organizing runs in Tensorboard


I'm working on a probabilistic forecast model using RNNs and want to log multiple runs with different parameters in Tensorboard to evaluate and compare them. I'm quite new to Tensorboard and couldn't really come up with a good way to organize my runs. I want to be able to sort through them in Tensorboard by parameter values, so currently I'm using this rather clunky approach:

tb = SummaryWriter(log_dir=f'runs/leakyrelu/cuda{cuda_id}/m_epochs{max_epochs}/lr{learning_rate}/'
                                f'bs{batch_size}/h_h{history_horizon}/f_h{forecast_horizon}/'
                                f'core_{core_net}/drop_fc{dropout_fc}/'
                                f'drop_core{dropout_core}')

Is there any smart way or convention on how to do this without creating mile-long filenames or directories kilometres deep?


Solution

  • It seems you are doing HyperParameter tuning with multiple parameters.

    The best way to log such runs in Tensorboard is by using its HParams plugin.

    Step1: Importing

    import tensorflow as tf
    from tensorboard.plugins.hparams import api as hp
    

    After that, you create Hparam object of parameters you want to try different values for and create a summary writer.

    Step 2: Creating Hparam object and summary writer

    HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
    HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
    HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
    
    METRIC_ACCURACY = 'accuracy'
    
    with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
      hp.hparams_config(
        hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
        metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
      )
    

    Your created object will look something like this:

    HP_NUM_UNITS
    HParam(name='num_units', domain=IntInterval(16, 32), display_name=None, description=None)
    

    Step 3: Create a function for training and testing

    def train_test_model(hparams):
      model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
        tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax),
      ])
      model.compile(
          optimizer=hparams[HP_OPTIMIZER],
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'],
      )
    
      model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes
      _, accuracy = model.evaluate(x_test, y_test)
      return accuracy
    

    In this function hparams is a dictionary of type:

    {
       HParam Object 1: VALUE-FOR-THE-OBJECT,
       HParam Object 2: VALUE-FOR-THE-OBJECT,   
       HParam Object 3: VALUE-FOR-THE-OBJECT,   
    }
    

    The actual dictionary looks like this:

    {HParam(name='num_units', domain=Discrete([16, 32]), display_name=None, description=None): 32,
     HParam(name='dropout', domain=RealInterval(0.1, 0.2), display_name=None, description=None): 0.2,
     HParam(name='optimizer', domain=Discrete(['adam', 'sgd']), display_name=None, description=None): 'sgd'}
    

    Step 4: Function for logging into the Tensorboard.

    def run(run_dir, hparams):
      with tf.summary.create_file_writer(run_dir).as_default():
        hp.hparams(hparams)  # record the values used in this trial
        accuracy = train_test_model(hparams)
        tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)
    

    Here, run_dir is a path for each individual run.

    Step 5: Trying different parameter:

    session_num = 0
    
    for num_units in HP_NUM_UNITS.domain.values:
      for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
        for optimizer in HP_OPTIMIZER.domain.values:
          hparams = {
              HP_NUM_UNITS: num_units,
              HP_DROPOUT: dropout_rate,
              HP_OPTIMIZER: optimizer,
          }
          run_name = "run-%d" % session_num
          print('--- Starting trial: %s' % run_name)
          print({h.name: hparams[h] for h in hparams})
          run('logs/hparam_tuning/' + run_name, hparams)
          session_num += 1
    

    Note: num_units will take 2 values '16' and '32' not every value between 16 and 32.

    Your Tensorboard will look like this: Tabular View:

    table view

    Scatter Plot View:

    Scatter plot view..

    You can also combine this with Tensorboard callback in Keras by setting the path of the callback to run_dir.

    For eg:

    def train_test_model(hparams, run_dir):
        model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
            tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax)
        ])
        model.compile(
            optimizer=hparams[HP_OPTIMIZER],
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )
        
        callbacks = [
            tf.keras.callbacks.TensorBoard(run_dir),
        ]
        
        model.fit(x_train, y_train, epochs=10, callbacks = callbacks) # Run with 1 epoch to speed things up for demo purposes
        
        _, accuracy = model.evaluate(x_test, 
                                     y_test)
        return accuracy
    

    The above-mentioned steps are good if you want log custom metrics or a variety of metrics other than accuracy or loss which you have defined in the compile method.

    But if you don't want to use custom metrics or don't want to deal with summary writers etc. You can use Keras callbacks to simplify the process. Complete code with callbacks without summary writers

    # Creating Hparams
    HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
    HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
    HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd'])) 
    
    # Creating train test function
    def train_test_model(hparams, run_dir):
        model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
            tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax)
        ])
        model.compile(
            optimizer=hparams[HP_OPTIMIZER],
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )
        callbacks = [
            tf.keras.callbacks.TensorBoard(run_dir),# log metrics
            hp.KerasCallback(run_dir, hparams),  # log hparams
        ]
        model.fit(x_train, y_train, epochs=10, callbacks = callbacks) # Run with 1 epoch to speed things up for demo purposes
        _, accuracy = model.evaluate(x_test, 
                                     y_test)
        return accuracy 
    
    # Running different configurations
    session_num = 0
    
    for num_units in HP_NUM_UNITS.domain.values:
        for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
            for optimizer in HP_OPTIMIZER.domain.values:
                hparams = {
                    HP_NUM_UNITS: num_units,
                    HP_DROPOUT: dropout_rate,
                    HP_OPTIMIZER: optimizer,
                }
                run_name = "run-%d" % session_num
                print('--- Starting trial: %s' % run_name)
                print({h.name: hparams[h] for h in hparams})
                train_test_model(hparams, 'logs/hparam_tuning/' + run_name)
                session_num += 1
    

    Useful Links:

    1. Hyperparameter Tuning with the HParams Dashboard
    2. Hparams demo using all possible Hparam objects - Official Github Repo