I want a "convolutional" layer where the filter size is 1. I can achieve this using (option 1)
tflearn.layers.conv.conv_1d( input, n_output_channels, 1 )
or roll my own using (option 2)
tf.matmult( input, tf.tile( weights, [batch_size, 1, 1] ) )
where input has dimensions [batch,sequence,n_input_channels] and weights is [1,n_input_channels,n_output_channels].
The performance of these two options seems roughly equivalent, but I would guess both have inefficiencies: option 1 presumably has overhead from expecting a "real" convolutional, and the tile operation seems like it should be unnecessary in option 2. Is there a smarter way I could be doing this?
If you plan on using the GPU, the best is probably to stick to native cuDNN operations, in that case convolutions.
NVIDIA does not provide much details on their implementation, but I would be surprised that they do not have dedicated implementations for common, small sizes used in NN, including kernels with 1x1
spatial range. That most probably applies to other NN-specialized library as well such as Intel MKL-DNN on CPU.
Only when using generic, non-NN convolution libraries, or badly optimized ones, should your question apply. I don't think that is the case of tensorflow or any other major DL libaries and their dependencies out there. (Could be interesting to check in Eigen.)