Search code examples
kerasdeep-learningnlptransformer-modelattention-model

Dimension of Query and Key Tensor in MultiHeadAttention


I am confused about the dimensions that are mentioned for query and key tensors in the documentation of MultiHeadAttention Layer in Keras documentation https://keras.io/api/layers/attention_layers/multi_head_attention/

query: Query Tensor of shape (B, T, dim)

value: Value Tensor of shape (B, S, dim).

Here I am presuming that T and S corresponds to Sequence of words fed in the model which should be same then why they are unequal?


Solution

  • This is useful when query and key value pair have different input dimension for sequence.

    This case can arise in the case of the second MultiHeadAttention() attention layer in the Decoder. This will be different as the input of K(key) and V(value) to this layer will come from the Encoder() while the Q(query) will come from the first MultiHeadAttention() layer of Decoder. enter image description here