Search code examples
pythondeep-learningnlppytorchattention-model

attn_output_weights in MultiheadAttention


I wanna know if the matrix of the attn_output_weight can demonstrate the relationship between every word-pair in the input sequence. In my project, I draw the heat map based on this output and it shows like this: enter image description here

However, I can hardly see any information from this heat map. I refer to other people's work, their heat map is like this. At least the diagonal of the matrix should have the deep color. enter image description here

Then I wonder if my method to draw the heat map is correct or not (i.e. directly using the output of the attn_output_weight ) If this is not the correct way, could you please tell me how to draw the heat map?


Solution

  • It seems your range of values is rather limited. In the target example the range of values lies between [0, 1], since each row represents the softmax distribution. This is visible from the definition of attention:

    enter image description here

    I suggest you normalize each row / column (according to the attention implementation you are using) and finally visualize the attention maps in the range [0, 1]. You can do this using the arguments vmin and vmax respectively in matplotlib plottings.

    If this doesn't solve the problem, maybe add a snippet of code containing the model you are using and the visualization script.