python pytorch conv-neural-network classification metrics

Validation accuracy and loss is the same after each epoch

My validation accuracy is the same after every epoch. Not sure what i'm doing wrong here? I have added my CNN network and my training function below. I initialise the CNN once. The training function however, works perfectly fine, the loss goes down and the accuracy increases per epoch. I made a test function the same structure as my validation function and the same thing happens. My train/val split is 40000/10000. I am using cifar 10.

Below is my code:


#Make train function (simple at first)
def train_network(model, optimizer, train_loader, num_epochs=10):

  total_epochs = notebook.tqdm(range(num_epochs))
  model.train()

  for epoch in total_epochs:
    train_acc = 0.0
    running_loss = 0.0

    for i, (x_train, y_train) in enumerate(train_loader):
      x_train, y_train = x_train.to(device), y_train.to(device)

      y_pred = model(x_train)
      loss = criterion(y_pred, y_train)
    
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      running_loss += loss.item()
      train_acc += accuracy(y_pred, y_train)

    running_loss /= len(train_loader)
    train_acc /= len(train_loader)

    print('Evaluation Loss: %.3f | Evaluation Accuracy: %.3f'%(running_loss, train_acc))


@torch.no_grad()
def validate_network(model, optimizer, val_loader, num_epochs=10):
  model.eval()
  total_epochs = notebook.tqdm(range(num_epochs))


  for epoch in total_epochs:  
    accu = 0.0
    running_loss = 0.0

    for i, (x_val, y_val) in enumerate(val_loader):
      x_val, y_val = x_val.to(device), y_val.to(device)

      val_pred = model(x_val)
      loss = criterion(val_pred, y_val)

      running_loss += loss.item()
      accu += accuracy(val_pred, y_val)

    running_loss /= len(val_loader)
    accu /= len(val_loader)

    
    print('Val Loss: %.3f | Val Accuracy: %.3f'%(running_loss,accu))

OUTPUT:

Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786

So I guess my question is, how do I get a a representative output for my accuracy and loss per epoch when validating.

Solution

What happens here is that you run a loop for number_of_epochs where you just trst the same network multiple times. I would recommend you calling the validation function during training at the end of each epoch to test the improvement of the epoch to the model's performance. This means that the training function should look something like:

def train_network(model, optimizer, train_loader, val_loader, num_epochs=10):

  total_epochs = notebook.tqdm(range(num_epochs))
  model.train()

  for epoch in total_epochs:
    train_acc = 0.0
    running_loss = 0.0

    for i, (x_train, y_train) in enumerate(train_loader):
      x_train, y_train = x_train.to(device), y_train.to(device)

      y_pred = model(x_train)
      loss = criterion(y_pred, y_train)
    
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      running_loss += loss.item()
      train_acc += accuracy(y_pred, y_train)

    running_loss /= len(train_loader)
    train_acc /= len(train_loader)

    print('Evaluation Loss: %.3f | Evaluation Accuracy: %.3f'%(running_loss, train_acc))
    validate_network(model, optimizer, val_loader, num_epochs=1)

Notice that I added the validation loader as input and called the validation function at the end of each epoch, setting the validation number of epochs to 1. A small additional change will be to remove the epochs loop from the validation function.