I build a toy CNN model to fit a pair of random tensors(input_tensor & truth).
batch_size = 1
channel = 3
input_size = 128
input_tensor = torch.rand((batch_size, channel, input_size, input_size))
truth = torch.rand((batch_size, channel, input_size, input_size))
device = torch.device("cuda")
class ConvModel(nn.Module):
def __init__(self):
super(ConvModel, self).__init__()
self.conv1 = nn.Conv2d(3, 57344, (3, 3), (1, 1), padding=1)
self.conv2 = nn.Conv2d(57344, 3, (3, 3), (1, 1), padding=1)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self, input_):
x = self.conv1(input_)
x = self.relu(x)
x = self.conv2(x)
x = self.sigmoid(x)
return x
model = ConvModel().to(device)
loss_func = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-5)
for epoch in range(100):
output = model(input_tensor.to(device))
loss = loss_func(output, truth.to(device))
loss.backward()
optimizer.step()
optimizer.zero_grad()
if (1 + epoch) % 10 == 0:
print(loss.detach().item())
I used the above codes to generate input&output pair and trained the model, and I got loss values as follow:
0.08877705037593842
0.08524381369352341
0.08396070450544357
0.0834180936217308
0.08318136632442474
0.08298520743846893
0.08282201737165451
0.08265350759029388
0.08248833566904068
0.08231770992279053
I'm confused that my model almost cannot fit ONE pair of data in 100 EPOCHS. Is there any problem?
Thanks for any feedback.
Note that the convolution kernel is shared spatially. You network is just like trying to map a random 7*7 matrix to a random value (7 is the size of the receptive field of the output layer), and you have 128*128 this kind of pairs (despite you have only one pair of tensor). So you network failed to overfit your dataset. Reducing the input_size
may help you reduce the loss.