How do you make the reported step a multiple of the logging frequency in PyTorch-Lightning, not the logging frequency minus 1?

[Warning!! pedantry inside]

I'm using PyTorch Lightning to wrap my PyTorch model, but because I'm pedantic, I am finding the logger to be frustrating in the way it reports the steps at the frequency I've asked for, minus 1:

When I set log_every_n_steps=100 in Trainer, my Tensorboard output shows my metrics at step 99, 199, 299, etc. Why not at 100, 200, 300?
When I set check_val_every_n_epoch=30 in Trainer, my console output shows progress bar goes up to epoch 29, then does a validate, leaving a trail of console outputs that report metrics after epochs 29, 59, 89, etc. Like this:

Epoch 29: 100%|█████████████████████████████| 449/449 [00:26<00:00, 17.01it/s, loss=0.642, v_num=logs]
[validation] {'roc_auc': 0.663, 'bacc': 0.662, 'f1': 0.568, 'loss': 0.633}
Epoch 59: 100%|█████████████████████████████| 449/449 [00:26<00:00, 16.94it/s, loss=0.626, v_num=logs]
[validation] {'roc_auc': 0.665, 'bacc': 0.652, 'f1': 0.548, 'loss': 0.630}
Epoch 89: 100%|█████████████████████████████| 449/449 [00:27<00:00, 16.29it/s, loss=0.624, v_num=logs]
[validation] {'roc_auc': 0.665, 'bacc': 0.652, 'f1': 0.548, 'loss': 0.627}

Am I doing something wrong? Should I simply submit a PR to PL to fix this?

Solution

You are not doing anything wrong. Python uses zero-based indexing so epoch counting starts at zero as well. If you want to change the behavior of what is being displayed you will need to override the default TQDMProgressBar and modify on_train_epoch_start to display an offsetted value. You can achieve this by:

from pytorch_lightning.callbacks.progress.tqdm_progress import convert_inf

class LitProgressBar(TQDMProgressBar):
    def init_validation_tqdm(self):
        bar = super().init_validation_tqdm()
        bar.set_description("running validation...")
        return bar
    def on_train_epoch_start(self, trainer, *_) -> None:
        total_train_batches = self.total_train_batches
        total_val_batches = self.total_val_batches
        if total_train_batches != float("inf") and total_val_batches != float("inf"):
            # val can be checked multiple times per epoch
            val_checks_per_epoch = total_train_batches // trainer.val_check_batch
            total_val_batches = total_val_batches * val_checks_per_epoch
        total_batches = total_train_batches + total_val_batches
        self.main_progress_bar.reset(convert_inf(total_batches))
        self.main_progress_bar.set_description(f"Epoch {trainer.current_epoch + 1}")

Notice the +1 in the last line of code. This will offset the epoch displayed in the progress bar. Then pass your custom bar to your trainer:

# Initialize a trainer
trainer = Trainer(
    accelerator="auto",
    devices=1 if torch.cuda.is_available() else None,  # limiting got iPython runs
    max_epochs=3,
    callbacks=[LitProgressBar()],
    log_every_n_steps=100
)

Finaly:

trainer.fit(mnist_model, train_loader)

For the first epoch this will display:

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 7.9 K 
--------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)

Epoch 1: 17%                        160/938 [00:02<00:11, 68.93it/s, loss=1.05, v_num=4]

and not the default

Epoch 0: 17%                        160/938 [00:02<00:11, 68.93it/s, loss=1.05, v_num=4]