machine-learning deep-learning pytorch batch-normalization batchnorm

Batchnorm2d Pytorch - Why pass number of channels to batchnorm?

Why do I need to pass the previous nummber of channels to the batchnorm? The batchnorm should normalize over each datapoint in the batch, why does it need to have the number of channels then ?

Solution

Batch normalisation has learnable parameters, because it includes an affine transformation.

From the documentation of nn.BatchNorm2d:

The mean and standard-deviation are calculated per-dimension over the mini-batches and γ and β are learnable parameter vectors of size C (where C is the input size). By default, the elements of γ are set to 1 and the elements of β are set to 0.

Since the norm is calculated per channel, the parameters γ and β are vectors of size num_channels (one element per channel), which results in an individual scale and shift per channel. As with any other learnable parameter in PyTorch, they need to be created with a fixed size, hence you need to specify the number of channels

batch_norm = nn.BatchNorm2d(10)

# γ
batch_norm.weight.size()
# => torch.Size([10])

# β
batch_norm.bias.size()
# => torch.Size([10])

Note: Setting affine=False does not use any parameters and the number of channels wouldn't be needed, but they are still required, in order to have a consistent interface.