I wanna know if the matrix of the attn_output_weight
can demonstrate the relationship between every word-pair in the input sequence.
In my project, I draw the heat map based on this output and it shows like this:
However, I can hardly see any information from this heat map. I refer to other people's work, their heat map is like this. At least the diagonal of the matrix should have the deep color.
Then I wonder if my method to draw the heat map is correct or not (i.e. directly using the output of the attn_output_weight
) If this is not the correct way, could you please tell me how to draw the heat map?
It seems your range of values is rather limited. In the target example the range of values lies between [0, 1]
, since each row represents the softmax distribution. This is visible from the definition of attention:
I suggest you normalize each row / column (according to the attention implementation you are using) and finally visualize the attention maps in the range [0, 1]
. You can do this using the arguments vmin
and vmax
respectively in matplotlib plottings.
If this doesn't solve the problem, maybe add a snippet of code containing the model you are using and the visualization script.