Search code examples
pytorchconv-neural-network

How to specify the batch dimension in a conv2D layer with pyTorch


I have a dataset of 600x600 grayscale images, grouped in batches of 50 images by a dataloader.

My network has a convolution layer with 16 filters, followed by Maxpooling with 6x6 kernels, and then a Dense layer. The output of the conv2D should be out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000, multiplied by the batch size, 50.

However when I try to do a forward pass I get the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000). I verified that the data is formatted correctly as [batch,n_channels,width,height] (so [50,1,600,600] in my case).

Logically the output should be a 50x160000 matrix, but apparently it is formatted as a 80000x100 matrix. It seems like torch is multiplying the matrices along the wrong dimensions. If anyone understands why, please help me understand too.

# get data (using a fake dataset generator)
dataset = FakeData(size=500, image_size= (1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset,[400,100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader  = DataLoader(test_data, batch_size=50, shuffle=True)

net = nn.Sequential(
    nn.Conv2d(
                in_channels=1,              
                out_channels=16,            
                kernel_size=5,                     
                padding=2,           
            ),
    nn.ReLU(),  
    nn.MaxPool2d(kernel_size=6),
    nn.Linear(160000, 1000),
    nn.ReLU(),
)

optimizer = optim.Adam(net.parameters(), lr=1e-3,)

epochs = 10
for i in range(epochs):
    for (x, _) in train_dataloader:
        optimizer.zero_grad()

        # make sure the data is in the right shape
        print(x.shape) # returns torch.Size([50, 1, 600, 600])

        # error happens here, at the first forward pass
        output = net(x)

        criterion = nn.MSELoss()
        loss = criterion(output, x)
        loss.backward()
        optimizer.step()

Solution

  • If you inspect your model's inference layer by layer you would have noticed that the nn.MaxPool2d returns a 4D tensor shaped (50, 16, 100, 100). There are different ways to reduce spatial dimensionality (flattening, average-pooling, max-pooling). For instance, if you want to flatten the spatial dimensions, this will result in a tensor of shape (50, 16*100*100), ie. (50, 160_000) as you expected to have. This being said you are required to use a nn.Flatten layer.

    net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
                        nn.ReLU(),  
                        nn.MaxPool2d(kernel_size=6),
                        nn.Flatten(),
                        nn.Linear(160000, 1000),
                        nn.ReLU())