Search code examples
pythontensorflowmachine-learningedittensorboard

How do you edit an existing Tensorboard Training Loss summary?


I've trained my network and generated some training/validation losses which I saved via the following code example (example of training loss only, validation is perfectly equivalent):

valid_summary_writer = tf.summary.create_file_writer("/path/to/logs/")
with train_summary_writer.as_default():
    tf.summary.scalar('Training Loss', data=epoch_loss, step=current_step)

After training I would then like to view the loss curves using Tensorboard. However because I saved the loss curves under the names 'Training Loss' and 'Validation Loss' these curves are plotted on separate graphs. I know that I should change the name to be simply 'loss' to solve this problem for future writes to the log directory. But how do I edit my existing log files for the training/validation losses to account for this?

I attempted to modify the following post's solution: https://stackoverflow.com/a/55061404 which edits the steps of a log file and re-writes the file; where my version involves changing the tags in the file. But I had no success in this area. It also requires importing older Tensorflow code through 'tf.compat.v1'. Is there a way to achieve this (maybe in TF 2.X)?

I had thought to simply acquire the loss and step values from each log directory containing the losses and write them to new log files via my previous working method, but I only managed to obtain the step, and not the loss value itself. Has anyone had any success here?

---=== EDIT ===---

I managed to fix the problem using the code from @jhedesa

I had to slightly alter the way that the function "rename_events_dir" was called as I am using Tensorflow collaboratively inside of a Google Colab Notebook. To do this I changed the final part of the code which read:

if __name__ == '__main__':
    if len(sys.argv) != 5:
        print(f'{sys.argv[0]} <input dir> <output dir> <old tags> <new tag>',
              file=sys.stderr)
        sys.exit(1)
    input_dir, output_dir, old_tags, new_tag = sys.argv[1:]
    old_tags = old_tags.split(';')
    rename_events_dir(input_dir, output_dir, old_tags, new_tag)
    print('Done')

To read this:

rootpath = '/path/to/model/'
dirlist = [dirname for dirname in os.listdir(rootpath) if dirname not in ['train', 'valid']]
for dirname in dirlist:
  rename_events_dir(rootpath + dirname + '/train', rootpath + '/train', 'Training Loss', 'loss')
  rename_events_dir(rootpath + dirname + '/valid', rootpath + '/valid', 'Validation Loss', 'loss')

Notice that I called "rename_events_dir" twice, once for editing the tags for the training loss, and once for the validation loss tags. I could have used the previous method of calling the code by setting "old_tags = 'Training Loss;Validation Loss'" and using "old_tags = old_tags.split(';')" to split the tags. I used my method simply to understand the code and how it processed the data.


Solution

  • As mentioned in How to load selected range of samples in Tensorboard, TensorBoard events are actually stored record files, so you can read them and process them as such. Here is a script similar to the one posted there but for the purpose of renaming events, and updated to work in TF 2.x.

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    
    # rename_events.py
    
    import sys
    from pathlib import Path
    import os
    # Use this if you want to avoid using the GPU
    os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
    import tensorflow as tf
    from tensorflow.core.util.event_pb2 import Event
    
    def rename_events(input_path, output_path, old_tags, new_tag):
        # Make a record writer
        with tf.io.TFRecordWriter(str(output_path)) as writer:
            # Iterate event records
            for rec in tf.data.TFRecordDataset([str(input_path)]):
                # Read event
                ev = Event()
                ev.MergeFromString(rec.numpy())
                # Check if it is a summary
                if ev.summary:
                    # Iterate summary values
                    for v in ev.summary.value:
                        # Check if the tag should be renamed
                        if v.tag in old_tags:
                            # Rename with new tag name
                            v.tag = new_tag
                writer.write(ev.SerializeToString())
    
    def rename_events_dir(input_dir, output_dir, old_tags, new_tag):
        input_dir = Path(input_dir)
        output_dir = Path(output_dir)
        # Make output directory
        output_dir.mkdir(parents=True, exist_ok=True)
        # Iterate event files
        for ev_file in input_dir.glob('**/*.tfevents*'):
            # Make directory for output event file
            out_file = Path(output_dir, ev_file.relative_to(input_dir))
            out_file.parent.mkdir(parents=True, exist_ok=True)
            # Write renamed events
            rename_events(ev_file, out_file, old_tags, new_tag)
    
    if __name__ == '__main__':
        if len(sys.argv) != 5:
            print(f'{sys.argv[0]} <input dir> <output dir> <old tags> <new tag>',
                  file=sys.stderr)
            sys.exit(1)
        input_dir, output_dir, old_tags, new_tag = sys.argv[1:]
        old_tags = old_tags.split(';')
        rename_events_dir(input_dir, output_dir, old_tags, new_tag)
        print('Done')
    

    You would use it like this:

    > python rename_events.py my_log_dir renamed_log_dir "Training Loss;Validation Loss" loss