Search code examples
pythontensorflowattention-model

Why an output of attention decoder need to be combined with attention


legacy_seq2seq in tensorflow

x = linear([inp] + attns, input_size, True)
# Run the RNN.
cell_output, state = cell(x, state)
# Run the attention mechanism.
if i == 0 and initial_state_attention:
  with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True):
    attns = attention(state)
else:
  attns = attention(state)
with variable_scope.variable_scope("AttnOutputProjection"):
  output = linear([cell_output] + attns, output_size, True)

My question is that why we need to combine the cell_output with the attns rather than just use the cell_output as the output?

Thanks


Solution

  • attention mechanism required to put more attention on some special or specific node.

    here you cell_output is a matrix in mathematics. and a representation or combination of nodes in deep learning.

    so at the end , if you want to give more priorities to some data then you have to make some changes to your cell_output. and that's we are doing by doing some concatenation or addition or dot product operation to original matrix (cell_output).

    let x = 5
    and you want to make x = 7
    then you can do x = x + 2(one way).
    so that means you have make changes to your x variable. 
    same operation you are doing to apply attention to your hidden layers nodes or in your case cell_output. 
    here x is cell_output and 2 is attention output.
    

    if you don't make any changes to your cell_output, then how come you will add attention to your output representation !!.

    you can pass directly cell_output to final layer without combining with attention matrix or without applying attention. but then you need to know why attention mechanism required in Neural Network !!