what does model.eval() do for batch normalization layer?

Why does the testing data use the mean and variance of the all training data? To keep the distribution consistent? What is the difference between the BN layer using model.train compared to model.val

Solution

It fixes the mean and var computed in the training phase by keeping estimates of it in running_mean and running_var. See PyTorch Documentation.

As noted there the implementation is based on the description in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. As one tries to use the whole training data one can get (given similar data train/test data) a better estimate of the mean/var for the (unseen) test set.

Also similar questions have been asked here: What does model.eval() do?

How can I tell when the model is overfitting?
What do BatchNorm2d's running_mean / running_var mean in PyTorch?
Map each element of torch.Tensor with it's value in the dict
Pytroch clamp for complex values
Can batch normalization be considered a linear transformation?
Doing PyWavelets calculation on GPU
What is the difference between an Embedding Layer with a bias immediately afterwards and a Linear Layer in PyTorch
How do I display a single image in PyTorch?
Examples or explanations of pytorch dataloaders?
How to map values from a 3D tensor to a 1D tensor in PyTorch?
Traceback (most recent call last) in Colab when looping through dataloader in pytorch
Is there a way to use list of indices to simultaneously access the modules of nn.ModuleList in python?
How to multiply 2x3x3x3 matrix by 2x3 matrix to get 2x3 matrix
How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?
HuggingFace Model - OnnxRuntime - Jupyter Notebook Print Model Summary
The “Forward/Backward Passage Size” is too large for the pytorch model (Yolov3)
How to solve the pytorch RuntimeError: Numpy is not available without upgrading numpy to the latest version because of other dependencies
Torch Euclidian Norm (L2)
Does order of transforms applied for data augmentation matter in Torchvision transforms?
Problem in Backpropagation through a sample in Beta distribution in pytorch
Reinforcement Learning Gymnasium ValueError
Difference between torch.as_tensor() and torch.asarray()
Turn Grayscale mask-image into RGB while changing one grayscale into specific color
Why KL divergence is negative in Pytorch?
Neural network learning to sum two numbers
Forward pass with all samples
Pytorch: how to (efficiently) apply a function without a “dim” argument to each row of a 2D tensor?
override pytorch Dataset efficiently
Implementation of F1-score, IOU and Dice Score
Why bilinear scaling of images with PIL and pytorch produces different results?