keras deep-learning nlp transformer-model attention-model

Dimension of Query and Key Tensor in MultiHeadAttention

I am confused about the dimensions that are mentioned for query and key tensors in the documentation of MultiHeadAttention Layer in Keras documentation https://keras.io/api/layers/attention_layers/multi_head_attention/

query: Query Tensor of shape (B, T, dim)

value: Value Tensor of shape (B, S, dim).

Here I am presuming that T and S corresponds to Sequence of words fed in the model which should be same then why they are unequal?

Solution

This is useful when query and key value pair have different input dimension for sequence.

This case can arise in the case of the second MultiHeadAttention() attention layer in the Decoder. This will be different as the input of K(key) and V(value) to this layer will come from the Encoder() while the Q(query) will come from the first MultiHeadAttention() layer of Decoder.

How can I close over variables in kdb/Q?
When is the EACH operator extension necessary in K besides mod/rotate?
Handling single-character strings - in a function or in its caller? ssr()
Kdb+ data fomat when writing to a file
How to convert a symbol to a string in kdb+?
Sum of each two elements using vector functions
A dictionary with a single value and multiple keys
Table transformation, table as list of dicts
Accumulator gives different result then direct function applying
Reshape [cols;table]
FK field over IPC
Protected execution, 2 cases
Enums for tables
Converge (fixed point) syntax difference in q and k
.Q.trp and bt handling
NULLs in q and in k.h
Strange view declaration behaviour
How to build a parse-tree of projections?
Could not evaluate manually created equial ~ parse tree
Select distinct for all columns from keyed table
Parallel execution: blocking receive, deferred synchronous
Multiple variable assignment in q
Select a table from the inside of external select
Select when one of filter-column may not exists
What is the meaning of `s attribute on a table?
On parallel execution - which side reports about an error?
Validate if a keyed table have unique keys
Applying dictionary to dictionary
About xkey implementation
Parse tree built on values from vars