x = linear([inp] + attns, input_size, True)
# Run the RNN.
cell_output, state = cell(x, state)
# Run the attention mechanism.
if i == 0 and initial_state_attention:
with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True):
attns = attention(state)
else:
attns = attention(state)
with variable_scope.variable_scope("AttnOutputProjection"):
output = linear([cell_output] + attns, output_size, True)
My question is that why we need to combine the cell_output with the attns rather than just use the cell_output as the output?
Thanks
attention mechanism required to put more attention on some special or specific node.
here you cell_output is a matrix in mathematics. and a representation or combination of nodes in deep learning.
so at the end , if you want to give more priorities to some data then you have to make some changes to your cell_output. and that's we are doing by doing some concatenation or addition or dot product operation to original matrix (cell_output).
let x = 5
and you want to make x = 7
then you can do x = x + 2(one way).
so that means you have make changes to your x variable.
same operation you are doing to apply attention to your hidden layers nodes or in your case cell_output.
here x is cell_output and 2 is attention output.
if you don't make any changes to your cell_output, then how come you will add attention to your output representation !!.
you can pass directly cell_output to final layer without combining with attention matrix or without applying attention. but then you need to know why attention mechanism required in Neural Network !!