In an attempt to understand how BatchNorm1d
works in PyTorch, I tried to match the output of BatchNorm1d
operation on a 2D tensor with manually normalizing it. The manual output seems to be scaled down by a factor of 0.9747. Here's the code (note that affine is set to false):
import torch
import torch.nn as nn
from torch.autograd import Variable
X = torch.randn(20,100) * 5 + 10
X = Variable(X)
B = nn.BatchNorm1d(100, affine=False)
y = B(X)
mu = torch.mean(X[:,1])
var_ = torch.var(X[:,1])
sigma = torch.sqrt(var_ + 1e-5)
x = (X[:,1] - mu)/sigma
#the ration below should be equal to one
print(x.data / y[:,1].data )
Output is:
0.9747
0.9747
0.9747
....
Doing the same thing for BatchNorm2d
works without any issues. How does BatchNorm1d
calculate its output?
Found out the reason. torch.var
uses Bessel's correction while calculating variance. Passing the attribute unbiased=False
gives identical values.