Search code examples

RuntimeError: Given input size: (64x1x1). Calculated output size: (64x0x0). Output size is too small

My model is:

def forward(self, x):
    x = self.first_bn(x)
    x = self.selu(x)

    x0 = self.block0(x)
    y0 = self.avgpool(x0).view(x0.size(0), -1)
    y0 = self.fc_attention0(y0)
    y0 = self.sig(y0).view(y0.size(0), y0.size(1), -1)
    y0 = y0.unsqueeze(-1)
    x = x0 * y0 + y0

    x = nn.MaxPool2d(2)(x)

    x2 = self.block2(x)
    y2 = self.avgpool(x2).view(x2.size(0), -1)
    y2 = self.fc_attention2(y2)
    y2 = self.sig(y2).view(y2.size(0), y2.size(1), -1)
    y2 = y2.unsqueeze(-1)
    x = x2 * y2 + y2

    x = nn.MaxPool2d(2)(x)

    x4 = self.block4(x)
    y4 = self.avgpool(x4).view(x4.size(0), -1)
    y4 = self.fc_attention4(y4)
    y4 = self.sig(y4).view(y4.size(0), y4.size(1), -1)
    y4 = y4.unsqueeze(-1)
    x = x4 * y4 + y4

    x = nn.MaxPool2d(2)(x)

    x = self.bn_before_gru(x)
    x = self.selu(x)
    x = x.squeeze(-2)
    x = x.permute(0, 2, 1)
    x, _ = self.gru(x)
    x = x[:, -1, :]
    x = self.fc1_gru(x)
    x = self.fc2_gru(x)

    return x

def _make_attention_fc(self, in_features, l_out_features):
    l_fc = []
    l_fc.append(nn.Linear(in_features=in_features, out_features=l_out_features))
    return nn.Sequential(*l_fc)

to solve RuntimeError: Given input size: (64x1x1). Calculated output size: (64x0x0). Output size is too small this error please give solution


  • The primary issue lies in your input size.

    If you examine the SpecRNet architecture, you'll notice that it includes some MaxPool2d modules.

    Let's consider an example where we input a tensor with the size (8, 1, 64, 64).

    Here are the outputs of each layer within the SpecRNet.

    INPUT:  torch.Size([8, 1, 64, 64])
    first_bn(x):  torch.Size([8, 1, 64, 64])
    selu(x):  torch.Size([8, 1, 64, 64])
    block0(x):  torch.Size([8, 20, 32, 32]) ######
    avgpool(x0).view(x0.size(0), -1):  torch.Size([8, 20])
    fc_attention0(y0):  torch.Size([8, 20])
    sig(y0).view(y0.size(0), y0.size(1), -1):  torch.Size([8, 20, 1])
    unsqueeze(-1):  torch.Size([8, 20, 1, 1])
    x0 * y0 + y0:  torch.Size([8, 20, 32, 32])
    MaxPool2d(2)(x):  torch.Size([8, 20, 16, 16]) ######
    block2(x):  torch.Size([8, 64, 8, 8]) ######
    avgpool(x2).view(x2.size(0), -1):  torch.Size([8, 64])
    fc_attention2(y2):  torch.Size([8, 64])
    sig(y2).view(y2.size(0), y2.size(1), -1):  torch.Size([8, 64, 1])
    unsqueeze(-1):  torch.Size([8, 64, 1, 1])
    x2 * y2 + y2:  torch.Size([8, 64, 8, 8])
    MaxPool2d(2)(x):  torch.Size([8, 64, 4, 4]) ######
    block4(x):  torch.Size([8, 64, 2, 2]) ######
    avgpool(x4).view(x4.size(0), -1):  torch.Size([8, 64])
    fc_attention4(y4):  torch.Size([8, 64])
    sig(y4).view(y4.size(0), y4.size(1), -1):  torch.Size([8, 64, 1])
    unsqueeze(-1):  torch.Size([8, 64, 1, 1])
    x4 * y4 + y4:  torch.Size([8, 64, 2, 2])
    MaxPool2d(2)(x):  torch.Size([8, 64, 1, 1]) ######
    bn_before_gru(x):  torch.Size([8, 64, 1, 1])
    selu(x):  torch.Size([8, 64, 1, 1])
    squeeze(-2) torch.Size([8, 64, 1])
    permute(0, 2, 1):  torch.Size([8, 1, 64])
    gru(x):  torch.Size([8, 1, 128])
    fc1_gru(x):  torch.Size([8, 128])
    fc2_gru(x):  torch.Size([8, 1])
    OUTPUT:  torch.Size([8, 1])

    We observe that the shape is halved after passing through block0, block2, block4, and undergoing MaxPool2d operations.

    Since SpecRNet utilizes block0, block2, block4, and applies MaxPool2d 3 times, your input size should ideally be 2^6, which equals 64.

    On the other hand, because you define your model architecture in as

    def get_specrnet_config(input_channels: int) -> Dict:
        return {
            "filts": [input_channels, [input_channels, 20], [20, 64], [64, 64]],
            "nb_fc_node": 64,
            "gru_node": 64,
            "nb_gru_layer": 2,
            "nb_classes": 1,
    specrnet_config = get_specrnet_config(input_channels=1)

    It means that your input channel is 1.

    In summation, your input size should be (batch_size,1,64,64).