I am trying to build a PyTorch classification program with tabular dataset, my model has the following architecture:
LR = 1e-3
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.input_layer = nn.Linear(X.shape[1], HIDDEN_NEURONS)
self.linear = nn.Linear(HIDDEN_NEURONS, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.input_layer(x)
x = self.linear(x)
x = self.sigmoid(x)
return x
The model is pretty simple and small. The model has the following training loop:
total_loss_train_plot = []
total_loss_validation_plot = []
total_acc_train_plot = []
total_acc_validation_plot = []
for epoch in range(EPOCHS):
total_acc_train = 0
total_loss_train = 0
total_acc_val = 0
total_loss_val = 0
## Training and Validation
for indx, data in enumerate(train_dataloader):
input, label = data
prediction = model(input).squeeze(1)
batch_loss = criterion(prediction, label)
total_loss_train += batch_loss.item()
acc = ((prediction).round() == label).sum().item()
total_acc_train += acc
## Validation
with torch.no_grad():
for indx, data in enumerate(validation_dataloader):
input, label = data
prediction = model(input).squeeze(1)
batch_loss = criterion(prediction, label)
total_loss_train += batch_loss.item()
acc = ((prediction).round() == label).sum().item()
total_acc_val += acc
total_loss_train_plot.append(round(total_loss_train/1000, 4))
total_loss_validation_plot.append(round(total_loss_val/1000, 4))
total_acc_train_plot.append(round(total_acc_train/(training_data.__len__())*100, 4))
total_acc_validation_plot.append(round(total_acc_val/(validation_data.__len__())*100, 4))
print(f'''Epoch no. {epoch + 1} Train Loss: {total_loss_train/1000:.4f} Train Accuracy: {(total_acc_train/(training_data.__len__())*100):.4f} Validation Loss: {total_loss_val/1000:.4f} Validation Accuracy: {(total_acc_val/(validation_data.__len__())*100):.4f}''')
The loss and accuracy aren't improving and they are staying constant:
Epoch no. 1 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 2 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 3 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 4 Train Loss: 105.8375 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 5 Train Loss: 105.8375 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 6 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 7 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 8 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 9 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
Epoch no. 10 Train Loss: 105.8375 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
But when I changed the loss to BCEWithLogitsLoss
and removed the sigmoid layer, the training improved and worked fine by reducing loss and increasing the accuracy. When I change the loss to the loss with logits I get these results:
Epoch no. 1 Train Loss: 0.7597 Train Accuracy: 96.7476 Validation Loss: 0.0000 Validation Accuracy: 98.8270
Epoch no. 2 Train Loss: 0.9141 Train Accuracy: 96.2841 Validation Loss: 0.0000 Validation Accuracy: 98.6070
Epoch no. 3 Train Loss: 0.6364 Train Accuracy: 97.2189 Validation Loss: 0.0000 Validation Accuracy: 98.1305
Epoch no. 4 Train Loss: 0.7539 Train Accuracy: 96.5748 Validation Loss: 0.0000 Validation Accuracy: 98.8270
Epoch no. 5 Train Loss: 0.8025 Train Accuracy: 96.6062 Validation Loss: 0.0000 Validation Accuracy: 96.8109
Epoch no. 6 Train Loss: 0.6069 Train Accuracy: 96.8340 Validation Loss: 0.0000 Validation Accuracy: 98.9370
Epoch no. 7 Train Loss: 0.6626 Train Accuracy: 96.8261 Validation Loss: 0.0000 Validation Accuracy: 96.2977
Epoch no. 8 Train Loss: 0.5833 Train Accuracy: 96.6140 Validation Loss: 0.0000 Validation Accuracy: 98.6804
Epoch no. 9 Train Loss: 0.4303 Train Accuracy: 97.3604 Validation Loss: 0.0000 Validation Accuracy: 98.2405
Epoch no. 10 Train Loss: 0.5376 Train Accuracy: 97.0225 Validation Loss: 0.0000 Validation Accuracy: 96.9208
I know the difference between the two functions. One of them accepts only probabilities (BCELoss
) after the sigmoid and the other with the logits before the sigmoid. But why the network behave like this on changing both functions ? I used to do Bert for binary text classification with BCELoss
and worked perfectly fine.
Any explaination on this?
I found the problem, I had to normalize my data. I normalized the data using the following code and worked very fine:
for column in data_df.columns:
data_df[column] = data_df[column]/data_df[column].abs().max()