As the title says, if I change the number of hidden layers in my pytorch neural network to be anything different from the amount of input nodes it returns the error below.
RuntimeError: mat1 and mat2 shapes cannot be multiplied (380x10 and 2x10)
I think that the architecture is incorrectly coded but I am relatively new to pytorch and neural networks so I can't spot the mistake. Any help is greatly appreciated, I've included the code below
class FCN(nn.Module):
def __init__(self, N_INPUT, N_OUTPUT, N_HIDDEN, N_LAYERS):
super().__init__()
activation = nn.Tanh
self.fcs = nn.Sequential(*[
nn.Linear(N_INPUT, N_HIDDEN),
activation()])
self.fch = nn.Sequential(*[
nn.Sequential(*[
nn.Linear(N_INPUT, N_HIDDEN),
activation()]) for _ in range(N_LAYERS-1)])
self.fce = nn.Linear(N_INPUT, N_HIDDEN)
def forward(self, x):
x = self.fcs(x)
x = self.fch(x)
x = self.fce(x)
return x
torch.manual_seed(123)
pinn = FCN(2, 2, 10, 8)
If the pinn architecture is defined as pinn = FCN(2, 2, 2, 8)
no errors are returned but neural network does not perform well.
Other information:
Please let me know if you need anymore information and thank you!
The error you're getting is because the output of your first layer (fcs
) has dimension N_HIDDEN
(which is 10), while the hidden layers in fch
have input dimension N_INPUT
(which is 2).
To fix this, you have to ensure that the input size for all layers matches the output size of the previous layer. In your code:
class FCN(nn.Module):
def __init__(self, N_INPUT, N_OUTPUT, N_HIDDEN, N_LAYERS):
super().__init__()
activation = nn.Tanh
self.fcs = nn.Sequential(
nn.Linear(N_INPUT, N_HIDDEN),
activation()
)
self.fch = nn.Sequential(*[
nn.Sequential(
nn.Linear(N_HIDDEN, N_HIDDEN), # Adjust input size to N_HIDDEN
activation()
) for _ in range(N_LAYERS - 1)
])
self.fce = nn.Linear(N_HIDDEN, N_OUTPUT) # Output layer
def forward(self, x):
x = self.fcs(x)
x = self.fch(x)
x = self.fce(x)
return x
Finally, to get good performance you should play with the hidden size (not just between 2 and 10, you can also try 100 or 1000), the number of layers (start with 1 or 2, not 8) and the learning rate of the optimizer.