Search code examples
pythonpytorchtensorboardpytorch-lightning

Interpreting loss curves in TensorBoard


I have used pytorch-lightning trainer to train my model. The train set has 6152, test and validate sets have 769 samples. Training is done for 10 epochs.

trainer = pl.Trainer(logger = logger, checkpoint_callback = checkpoint_callback, max_epochs = 10, gpus = 1, progress_bar_refresh_rate = 20)

Train loss curve[![][1]][1]][1]

Y-axis shows the training loss. X-axis should show epochs (0, 1, 2, ..., 9), which is unavailable here. Please help to understand the X-axis of this curve.


Solution

  • X axis is the number of steps. During each epoch you're taking N steps. (if you're running your training on 1 GPU with batch_size = 4) N = training_samples/4. So the maximum number of steps is just n_epochs * N.