Search code examples
pythonbashtensorflowkerasqsub

Printing training progress with Keras using QSUB and a bash file


I'm able to run a python script that trains a model using Keras/Tensorflow with the following bash script:

#!/bin/bash
#PBS -N Tarea_UNET
#PBS -l nodes=1:ppn=4:gpus=1
cd $PBS_O_WORKDIR
source $ANACONDA3/activate inictel_uni
python U-NET.py

Inside "U-NET.py" the training function goes like this:

history=model.fit(train_B,train_A, epochs = 200, batch_size = 20, validation_split=0.052631578, shuffle=True)

The problem is I can't visualize the training progress that helps me to monitor the metrics or see the estimated training time and I've got to wait until the whole process finishes. "qstat" gives me only the time it has been running the code, so it's useless. Do you have any ideas?


Solution

  • One simple approach is to provide a callback for Keras to invoke at the right times. You can do whatever logging, progress reporting you want in this callback.

    Here is the high-level documentation and some pre-made callbacks: https://keras.io/callbacks/

    Usage is very simple. You just pass a list of callback to fit

    model.fit(x_train, y_train, ... callbacks=[<your_callbacks>])
    

    See examples at the end of the doc.

    You can see all the methods that you can override here: https://github.com/keras-team/keras/blob/adc321b4d7a4e22f6bdb00b404dfe5e23d4887aa/keras/callbacks.py#L146