I am a total newbie to neural networks using Pytorch to create a VAE model. I've used a bit of tensorflow before, but I have no idea what "in_channels" and "out_channels" are, as arguments to nn.Conv2d/nn.Conv1d.
Disclaimers aside, currently, my model takes in a dataloader with batch size 128 and where each input is a 248 by 46 tensor (so, a 128 x 248 x 46 tensor).
My encoder looks like this right now -- I chopped it down so I could focus on where the error was coming from.
class Encoder(nn.Module):
def __init__(self, latent_dim):
super(Encoder, self).__init__()
self.latent_dim = latent_dim
self.conv1 = nn.Conv2d(in_channels=248, out_channels=46, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))
def forward(self, x):
print(x.size())
x = F.relu(self.conv1(x))
return x
The Conv2d layer was meant to reduce the 248 by 46 input into a 50 by 46 tensor. However, I get this error:
RuntimeError: Given groups=1, weight of size [46, 248, 9, 9], expected input[1, 128, 248, 46] to have 248 channels, but got 128 channels instead
...even though I print x.size()
and it displays as [torch.Size([128, 248, 46])
.
I am unsure a) why the error shows that the layer is adding on an extra dimension to x, and b) whether I am even understanding channels correctly. Should 46 be the real number of channels? Why doesn't Pytorch simply request my input size as a tuple or something, like in=(248, 46)
?
Or c) if this is an issue with the way I loaded in my data to the model. I have a numpy array data
of shape (-1, 248, 46)
and then started training my model as follows.
tensor_data = torch.from_numpy(data)
dataset = TensorDataset(tensor_data, tensor_data)
train_dl = DataLoader(dataset, batch_size=128, shuffle=True)
...
for epoch in range(20):
for x_train, y_train in train_loader:
x_train = x_train.to(device).float()
optimizer.zero_grad()
x_pred, mu, log_var = vae(x_train)
bce_loss = train.BCE(y_train, x_pred)
kl_loss = train.KL(mu, log_var)
loss = bce_loss + kl_loss
loss.backward()
optimizer.step()
Any thoughts appreciated!
In pytorch, nn.Conv2d
assumes the input (mostly image data) is shaped like: [B, C_in, H, W]
, where B
is the batch size, C_in
is the number of channels, H
and W
are the height and width of the image. The output has a similar shape [B, C_out, H_out, W_out]
. Here, C_in
and C_out
are in_channels
and out_channels
, respectively. (H_out, W_out)
is the output image size, which may or may not equal (H, W)
, depending on the kernel size, the stride and the padding.
However, it is confusing to apply conv2d to reduce [128, 248, 46]
inputs to [128, 50, 46]
. Are they image data with height 248 and width 46? If so you can reshape the inputs to [128, 1, 248, 46]
and use in_channels = 1
and out_channels = 1
in conv2d.