Search code examples
deep-learningneural-networkpytorch

Cannot figure out dense layers dimensions to run the neural network


I am trying to build a multi layer neural network. I have train data with shape:

train[0][0].shape 
(4096,)

Below is my dense layer

from collections import OrderedDict
n_out = 8
net = nn.Sequential(OrderedDict([
                            ('hidden_linear', nn.Linear(4096, 1366)),
                            ('hidden_activation', nn.Tanh()),
                            ('hidden_linear', nn.Linear(1366, 456)),
                            ('hidden_activation', nn.Tanh()),
                            ('hidden_linear', nn.Linear(456, 100)),
                            ('hidden_activation', nn.Tanh()), 
                            ('output_linear', nn.Linear(100, n_out))
                            ]))

I am using crossentropy as the loss function. The problem I have is when I train the model with the below code:

 learning_rate = 0.001
 optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate)
 n_epochs = 40

for epoch in range(n_epochs):
    for snds, labels in final_train_loader:
         outputs = net(snds.view(snds.shape[0], -1))
         loss = loss_fn(outputs, labels)

         optimizer.zero_grad()
         loss.backward()
         optimizer.step()

     print("Epoch: %d, Loss: %f" % (epoch, float(loss)))

The error I receive is the matrix multiplication error.

 RuntimeError: mat1 and mat2 shapes cannot be multiplied (100x4096 and 456x100)

I have the dimensions wrong but cannot figure out how to get it right.


Solution

  • The OrderedDict contains three Linear layers associated with the same key, hidden_layer (the same happens with nn.Tanh). In order to make it work you need to provide such layers with a different name:

    inp = torch.rand(100, 4096)
    net = nn.Sequential(OrderedDict([
                                ('hidden_linear0', nn.Linear(4096, 1366)),
                                ('hidden_activation0', nn.Tanh()),
                                ('hidden_linear1', nn.Linear(1366, 456)),
                                ('hidden_activation1', nn.Tanh()),
                                ('hidden_linear2', nn.Linear(456, 100)),
                                ('hidden_activation2', nn.Tanh()), 
                                ('output_linear', nn.Linear(100, n_out))
                                ]))
    net(inp)  # now it works!