I'm a begginer with PyTorch and I'm attempting to implement student-teacher architecture by initializing two networks with different hidden sizes from the same class. It seems the first network initialization influences the second one, more specifically I get different losses on the student network when initializing the teacher network first, even though I’m training the student network independently of the teacher.
My NN class uses a Linear layer followed by a BatchNorm1d layer and I'm initializing the BatchNorm weights using nn.init.uniform_
. So I’m guessing this is what causes the first initialization to influence the second, either the BatchNorm layer or Linear layer is keeping some running statistics from the first initialization.
I've tried resetting the running stats on the BatchNorm using reset_running_stats()
but that didn't change anything. Any ideas on how to solve this? Thanks.
Guaranteeing reproducible results when using neural networks is quite hard by the sheer amount of randomness involved. However, one way to limit the sources of randomness is by setting seeds.
This can be done in pytorch with:
import torch
torch.manual_seed(seed) # seed is any number of your choice
You were probably getting different results depending on the order of initialization because both networks were somehow using the same random number generator.
When dealing with multiple networks, try to set the seed right before instantiating the models to make them both receive the same numbers from the RNG. Something like:
torch.manual_seed(seed)
student = StudentNetwork()
torch.manual_seed(seed) # same seed as previous call
teacher = TeacherNetwork()