I am training CIFAR10 dataset on LeNet CNN model. I am using PyTorch on Google Colab. The code runs only when I use Adam optimizer with model.parameters() as the only parameter. But when I change my optimizer or use weight_decay parameter then the accuracy remains at 10% through all the epochs. I cannot understand the reason why it is happening.
# CNN Model - LeNet
class LeNet_ReLU(nn.Module):
def __init__(self):
super().__init__()
self.cnn_model = nn.Sequential(nn.Conv2d(3,6,5),
nn.ReLU(),
nn.AvgPool2d(2, stride=2),
nn.Conv2d(6,16,5),
nn.ReLU(),
nn.AvgPool2d(2, stride=2))
self.fc_model = nn.Sequential(nn.Linear(400, 120),
nn.ReLU(),
nn.Linear(120,84),
nn.ReLU(),
nn.Linear(84,10))
def forward(self, x):
x = self.cnn_model(x)
x = x.view(x.size(0), -1)
x = self.fc_model(x)
return x
# Importing dataset and creating dataloader
batch_size = 128
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
transform=transforms.ToTensor())
trainloader = utils_data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True,
transform=transforms.ToTensor())
testloader = utils_data.DataLoader(testset, batch_size=batch_size, shuffle=False)
# Creating instance of the model
net = LeNet_ReLU()
# Evaluation function
def evaluation(dataloader):
total, correct = 0, 0
for data in dataloader:
inputs, labels = data
outputs = net(inputs)
_, pred = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (pred==labels).sum().item()
return correct/total * 100
# Loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
opt = optim.Adam(net.parameters(), weight_decay = 0.9)
# Model training
loss_epoch_arr = []
max_epochs = 16
for epoch in range(max_epochs):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
outputs = net(inputs)
loss = loss_fn(outputs, labels)
loss.backward()
opt.step()
opt.zero_grad()
loss_epoch_arr.append(loss.item())
print('Epoch: %d/%d, Test acc: %0.2f, Train acc: %0.2f'
% (epoch,max_epochs, evaluation(testloader), evaluation(trainloader)))
plt.plot(loss_epoch_arr)
The weight decay mechanism sets a penalty for high value weights, i.e. it stricts the weights to have relatively small values by adding their sum multiplied by the weight_decay
argument you gave it. That can be seen as a quadratic regularization term.
When passing large weight_decay
value, you may strict your network too much and prevent it from learning, that's probably the reason it had 10% of accuracy which is related to non-learning at all and just guessing the answer (since you have 10 classes you receive 10% of acc, when the output isn't a function of your input at all).
The solution would be to play around with different values, train for weight_decay
of 1e-4
or some other values in that area. Note that when you reach values closer to zero you should have results which are closer to your initial train without using the weight decay.
Hope that helps.