I am confused about the dimensions that are mentioned for query and key tensors in the documentation of MultiHeadAttention
Layer in Keras documentation https://keras.io/api/layers/attention_layers/multi_head_attention/
query: Query Tensor of shape (B, T, dim)
value: Value Tensor of shape (B, S, dim).
Here I am presuming that T
and S
corresponds to Sequence of words fed in the model which should be same then why they are unequal?
This is useful when query and key value pair have different input dimension for sequence.
This case can arise in the case of the second MultiHeadAttention()
attention layer in the Decoder
. This will be different as the input of K(key)
and V(value)
to this layer will come from the Encoder()
while the Q(query)
will come from the first MultiHeadAttention()
layer of Decoder
.