Search code examples
pythonpytorchtorchvision

Why TorchVision's GoogLeNet has this strange "normalization"?


I'm reading the source code of TorchVision's GoogLeNet and I found these lines strange and can't figure it out.

def _transform_input(self, x: Tensor) -> Tensor:
    if self.transform_input:
        x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
        x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
        x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
        x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
    return x

I know that ImageNet datasets had mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] and it looks like some "normalization" but it is obviously not (x - mean) / std but more like x * std + mean. Also I don't know about the 0.5 thing.

Anyone who can explain these code?


Solution

  • This was done to match TensorFlow's way of preprocessing the input image. In the pull request that added GoogLeNet to TorchVision, the author explains that he matched the processing done by TensorFlow. Here is the commit that added the normalization in the question.

    the author who contributed GoogLeNet to TorchVision wrote

    I've updated the code to match the structure required for the TensorFlow weights. Also added the input normalization used for the Inception v3 model.