Search code examples
pythonmachine-learningpytorchmnist

PyTorch ToTensor scaling to [0,1] discrepancy


If I do this

mnist_train = MNIST('../data/MNIST', download = True,
                transform = transforms.Compose([
                    transforms.ToTensor(),
                ]), train = True)

and

mnist_train.data.max()

why do I get 255? I should get 1, because ToTensor() scales to [0,1], right?

If I do:

for i in range(0, len(mnist_train)):
    print(mnist_train[i][0].max())

then, I get almost 1?

Could someone please help me understand this?


Solution

  • When you do

    mnist_train.data
    

    PyTorch gives you the data attribute of the mnist_train, which is defined on this line (when you make a MNIST instance). And if you look at the codes before it in the __init__, no transformation happens!

    OTOH, when you do

    mnist_train[i]
    

    the __getitem__ method of the object is triggered which you can find here. There is an if statement for transform in this method and therefore you get the transformed version now.

    Since a common usage is using this MNIST dataset (or any other one) through torch.utils.data.DataLoader and it calls this __getitem__, we get the normalized values.