Why is MNIST showing as a list 4 levels deep?

Experimenting with some simple code using PyTorch on MNIST, and I'm puzzled about an aspect of how it's representing the data; maybe I'm just overlooking something really obvious.

Given

train_loader = torch.utils.data.DataLoader(
    torchvision.datasets.MNIST(
        "data",
        train=True,
        download=True,
        transform=torchvision.transforms.Compose(
            [
                torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize((0.1307,), (0.3081,)),
            ]
        ),
    ),
    batch_size=batch_size_train,
    shuffle=True,
)

and

for batch_idx, (data, target) in enumerate(train_loader):
    print(data)

I get

tensor([[[[-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          ...,
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242]]],


        [[[-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          ...,
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242]]],

I was expecting a tensor corresponding to a list three levels deep: a list of images, each of which is a list of rows, each of which is a list of numbers. Or put another way, the innermost [] is a row, the next [] is an image, the outermost [] is the list of images.

But instead it is four levels deep.

Why the extra level?

Solution

4 levels are

Batch
Channel
Row
Column