I am trying to find the dimensions of an image as it goes through a convolutional neural network at each layer. So for instance, if there is max-pooling or convolution being applied, I’d like to know the shape of the image at that layer, for all layers. I know I can use the nOut=image+2p-f / s + 1
formula but it would be too tedious and complex given the size of the PyTorch model. Is there a simple way to do this, perhaps a visualization tool/script or something?
You can use the torchinfo
library: https://github.com/TylerYep/torchinfo
Let's take their example:
from torchinfo import summary
model = ConvNet()
batch_size = 16
summary(model, input_size=(batch_size, 1, 28, 28))
Here (1, 28, 28)
is the input's size, which is (Channel, Width, Height)
of the image respectively.
The library will print:
================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param # Mult-Adds
================================================================================================================
SingleInputNet -- -- -- --
├─Conv2d: 1-1 [7, 1, 28, 28] [7, 10, 24, 24] 260 1,048,320
├─Conv2d: 1-2 [7, 10, 12, 12] [7, 20, 8, 8] 5,020 2,248,960
├─Dropout2d: 1-3 [7, 20, 8, 8] [7, 20, 8, 8] -- --
├─Linear: 1-4 [7, 320] [7, 50] 16,050 112,350
├─Linear: 1-5 [7, 50] [7, 10] 510 3,570
================================================================================================================
Total params: 21,840
Trainable params: 21,840
Non-trainable params: 0
Total mult-adds (M): 3.41
================================================================================================================
Input size (MB): 0.02
Forward/backward pass size (MB): 0.40
Params size (MB): 0.09
Estimated Total Size (MB): 0.51
================================================================================================================
I think 7
is wrong in this output though. It should be 16
.