I have a question about the PyTorch Lightning framework's CSVLogger that has been bugging me for a couple of weeks already.
When I try to log the training and validation losses in their respective training_step and validation_step functions, it seems that the resulting CSV file logs both metrics on separate lines in the metrics.csv file. The file looks like this:
Epoch | train_loss | val_loss |
---|---|---|
0 | 0.01 | null |
0 | null | 0.02 |
1 | 0.005 | null |
1 | null | 0.01 |
2 | 0.01 | null |
2 | null | 0.02 |
It also shows a step number that I've omitted here, though it's the same for each unique epoch.
Is there any way to place these in a single line in the CSV, using the built-in CSVLogger? I couldn't find anything about this online nor in the documentation.
The following code produces the problem described above:
import torch
from torch.nn import functional as F
from torch.utils.data import TensorDataset
import lightning as pl
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
features, target = iris.data, iris.target
train_features, val_features, train_target, val_target = train_test_split(features, target, test_size=0.2)
train_features = torch.tensor(train_features).float()
val_features = torch.tensor(val_features).float()
train_target = torch.tensor(train_target).long()
val_target = torch.tensor(val_target).long()
dm = pl.LightningDataModule.from_datasets(
train_dataset=TensorDataset(train_features, train_target),
val_dataset=TensorDataset(val_features, val_target),
batch_size=5,
)
class Model(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(4, 3)
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.layer(x)
loss = F.cross_entropy(y_hat, y)
self.log("train_loss", loss, prog_bar=True, on_step=False, on_epoch=True)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.layer(x)
loss = F.cross_entropy(y_hat, y)
self.log("val_loss", loss, prog_bar=True, on_step=False, on_epoch=True)
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
def forward(self, x):
return self.layer(x)
model = Model()
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, dm)
PyTorch Lightning version: 2.2.2
As a workaround, when I'm processing my data I just do
import pandas as pd
log = pd.read_csv("path_to_log_file.csv", sep=',')
log = log.groupby('epoch').mean() # merge the train and valid rows
log['Epoch'] = log.index # because "Epoch" gets turned into the index
log.index.name = '' # to remove the name "Epoch" from the index
Works fine for me in pandas v1.4.2 (not sure about others), since the NaNs are treated as 0 by .mean()
.