deep-learning pytorch batch-normalization transfer-learning

pytorch - loss.backward() and optimizer.step() in eval mode with batch norm layers?

I have a ResNet-8 network I am using for a project of Domain Adaptation over images, basically I have trained the network over a dataset and now I want to evaluate it over another dataset simulating a real time environment where I try to predict one image at a time, but here comes the fun part:

The way I want to do the evaluation on the target dataset is by doing, for each image, a forward pass in train mode so that the batch norm layers statistics are updated (with torch.no_grad(), since I don't want to update the network parameters then, but only "adapt" the batch norm layers), and then do another forward pass in eval mode to get the actual prediction so that the batch norm layers will use mean and variance based on the whole set of images seen so far (and not only those of that batch, a single image in this case):

optimizer.zero_grad()
model.train()
with torch.no_grad():
  output_train = model(inputs)
model.eval()
output_eval = model(inputs)
loss = criterion(output_eval, targets)

The idea is that I do domain adaptation just by updating the batch norm layers to the new target distribution.

Then after doing this let's say I get an accuracy of 60%. Now if I add this two other lines I am able to achieve something like 80% accuracy:

loss.backward()
optimizer.step()

Therefore my question is what happens if I do backward() and step() while in eval mode? Because I know about the different behaviour of batch norm and dropout layers between train and eval mode and I know about torch.no_grad() and how gradient are calculated and then parameters updated by the optimizer, but I wasn't able to find any information about my specific problem.

I think that since the model is then set in eval mode, those two line should be useless, but something clearly happens, does this have something to do with the affine parameters of the batch norm layers?

UPDATE: Ok I misunderstood something: eval mode does not block parameters to be updated, it only changes the behaviour of some layers (batch norm and dropout) during the forward pass, am I right? Therefore with those two lines I am actually training the network, hence the better accuracy. Anyway does this change something if batch norm affine is set to true? Are those parameters considered as "normal" parameters to be updated during optimizer.step() or is it different?

Solution

eval mode does not block parameters to be updated, it only changes the behaviour of some layers (batch norm and dropout) during the forward pass, am I right?

True.

Therefore with those two lines I am actually training the network, hence the better accuracy. Anyway does this change something if batch norm affine is set to true? Are those parameters considered as "normal" parameters to be updated during optimizer.step() or is it different?

BN parameters are updated during optimizer step. Look:

    if self.affine:
        self.weight = Parameter(torch.Tensor(num_features))
        self.bias = Parameter(torch.Tensor(num_features))
    else:
        self.register_parameter('weight', None)
        self.register_parameter('bias', None)