Search code examples
pythonneural-networkpytorchconvolution

Why does nn.Conv1d work on 2d feature [b, c, h, w]?


I am wondering why conv1d works on 2d feature(batch, channel, height, width).

An nn.Conv1d(channel, channel, kernel_size=(1,1)) works when I put 2d feature, but gives different result from nn.Conv2d(channel, channel, kernel_size=1).

I want to know why conv1d works and what it mean by 2d kernel size in 1d convolution.


Solution

  • "I want to know why conv1d works and what it mean by 2d kernel size in 1d convolution"

    It doesn't have any reason not to work. Under the hood all this "convolution" means is "Dot Product", now it could be between matrix and vector, matrix and matrix, vector and vector, etc. Simply put, the real distinction between 1D and 2D convolution is the freedom one has to move along the spatial dimension of input. This means If you look at 1D convolution, It can move along one direction only, that is, the temporal dimension of the input (Note the kernel could be a vector, matrix whatever that doesn't matter). On the other hand, 2D convolution has the freedom to move along 2 dimensions (height and width) of the input that is the spatial dimension. If it still seems confusing, have a look at the gifs below.

    1D Convolution in action:

    Note: It's a 1D convolution with kernel size 3x3, look how it only moves down the input which is the temporal dimension. 1d conv

    2D Connvolution in action:

    Note: It's a 2D convolution with kernel size 3x3, look how it moves along both width and height of the input which is the spatial dimension. 2d conv

    I think It's clear now what is the actual difference between 1D and 2D conv and why they both would produce different results for the same input.