Recently, I have learned decoder-encoder network and attention mechanism, and found that many papers and blogs implement attention mechanism on RNN network.
I am interested if other networks can incorporate attentional mechanisms.For example, the encoder is a feedforward neural network and decoder is an RNN. Can feedforward neural networks without time series use attentional mechanisms? If you can, please give me some suggestions.Thank you in advance!
In general Feed forward networks treat features as independent; convolutional networks focus on relative location and proximity; RNNs and LSTMs have memory limitations and tend to read in one direction.
In contrast to these, attention and the transformer can grab context about a word from distant parts of a sentence, both earlier and later than the word appears, in order to encode information to help us understand the word and its role in the system called sentence.
There is a good model for feed-forward network with attention mechanism here:
https://arxiv.org/pdf/1512.08756.pdf
hope to be useful.