python python-3.x tensorflow tensorboard

Tensorboard not showing the custom metric for the last trained model

I am training a series of models on python-tensorflow and showing the results on Tensorboard. This is done originally on a jupyter notebook, but the problem exists also on Colab apparently.

Apart from the validation results, I want to show on Tensorboard also the evaluation results on a test set and to do that I have created a custom callback by overriding the on_train_end method, which calls res = self.model.evalute(x=x_test, y=y_test, batch_size=1) and then tf.summary.scalar to store the result of the evaluation as scalar. After ending all the trainings, I launch tensorboard. The problem is that tensorboard is not showing the metric for the last model: given n models (whose architectures are defined in MODELS_ARCH, tensorboard is just showing the values of my custom metric for the first n-1 models.

Here is the code (being in a jupyter notebook, some lines cannot be run from python directly, such as !rm -rf NNlogs/* to remove the previous logs):

import tensorflow as tf
import numpy as np
import os
root_logdir = os.path.join(os.curdir , "NNlogs")

def get_run_logdir(): 
    import time
    run_id = time.strftime("%Y_%m_%d-%H_%M_%S") 
    return os.path.join(root_logdir , run_id)

def create_models():
    names = ['Dense5', 'Dense6', 'Dense7', 'Dense8']
    MODELS_ARCH = [
        [
            tf.keras.layers.Dense(5, activation='tanh')
        ], 
        [
            tf.keras.layers.Dense(6, activation='tanh')

        ],
        [
            tf.keras.layers.Dense(7, activation='tanh')
        ],
        [
            tf.keras.layers.Dense(8, activation='tanh')
        ]
    ]
    models = []
    for el in MODELS_ARCH:
        models.append(tf.keras.Sequential(el))
        models[-1]._name = names[len(models)-1]
    return models

x_train = np.array([[1., 2.,4., 5., 6.,],[1., 4.,0.5, 7., 9.,], [1., 2.6, 1., 5.6, 6.,]])
y_train = np.array([[ 5.5],[6.], [7.]])
x_val = np.array([[1.7, 5.2, 7.6, 5.2, 6.5,],[2.8, 4.2, 0.8, 7.3, 9.4,], [1.8, 8.4, 6.6, 6.6, 9.,]])
y_val = np.array([[ 5.5],[6.8], [7.1]])
x_test = np.array([[5.2, 7.7, 9.5, 10.8, 4.4,],[2.3, 16., 5.7, 8.8, 9.7,], [1.8, 8.4, 7.3, 6.4, 9.,]])
y_test = np.array([[ 5.5],[6.6], [8.1]])

!rm -rf NNlogs/*
lr = 1e-2
models = create_models()

EPOCHS = 10

class OnTrainEndCallback(tf.keras.callbacks.Callback):
    def __init__(self, epochs):
        super(OnTrainEndCallback, self).__init__()
        self.epochs = epochs
    def on_train_end(self, logs=None):
        res = self.model.evaluate(x=x_test, y=y_test, batch_size=1)
        tf.summary.scalar('Model evalutated on test', data=res, step=self.epochs)
        #return res
    
on_train_end_cb = OnTrainEndCallback(EPOCHS)

optimizer = tf.keras.optimizers.Adagrad(lr=1e-2)

histories = []
tests = []
for m in models:
    run_logdir = get_run_logdir() + "_" + m._name
    file_writer = tf.summary.create_file_writer(run_logdir + "/metrics")
    file_writer.set_as_default()
    
    tensorboard_cb = tf.keras.callbacks.TensorBoard(run_logdir, update_freq='epoch') #batch or epoch
    m.compile(loss='mse', optimizer=optimizer)
    history = m.fit(x=x_train, y=y_train, epochs=EPOCHS,
                    validation_data=(x_val, y_val),
                    callbacks=[tensorboard_cb, on_train_end_cb], batch_size=32)
    histories.append(history)
    tests.append(m.evaluate(x=x_test, y=y_test, batch_size=1))

If I do:

for m in models:
    print(m.name, m.evaluate(x=x_test, y=y_test, batch_size=1, verbose=0))

it prints all the models' evaluation results:

Dense2 43.158206939697266
Dense3 44.55398941040039
Dense4 43.6148681640625
Dense5 48.75056457519531

But when I launch tensorboard with

%load_ext tensorboard
%tensorboard --logdir NNlogs --host localhost

and selecting the metrics for all the models on the left menu', it is not showing the one of the Dense8 model, whereas it is showing it for the other models. The problem can be seen from this pic where you can see on the bottom left that the metric of the model has been selected, but the upper graph it is not showing the value for this model (trust it is not to the zoom).

Furthermore I checked the folder NNlogs for this model and I can see there is a file of extension *.v2 as for the other models, so I think the metric has been correctly safe.

Solution

Following this answer, only when the buffer is full tf.summary.create_file_writer will write on disk. Apparently, the buffer gets filled when more than one model evaluation is written. So at the fourth evaluation, it writes to disk the third, and the buffer contains the fourth, but not being full, this is not written to disk. To force the to flush the buffer and write the content on the disk, it is sufficient to call file_writer.close() after the for loop (outside of it).