I have a question about TRANSLATION WITH A SEQUENCE TO SEQUENCE in the pytorch tutorials

I am currently learning about the Seq2seq translation. I am trying to understand and following PyTorch tutorial from this website "https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#attention-decoder".

In the website, They talk about the Attention technique. I would like to know which technique are they used between Luong and Bahdanau? Another question, Why do they apply Relu layer before GRU cell? Finally, the red box in the figure is called a context vector, right?

Solution

I would like to know which technique are they used between Luong and Bahdanau?

Loung is multiplicative, so it should be using Bahdanau (additive attention) as it concats then applies linearity. See http://ruder.io/deep-learning-nlp-best-practices/index.html#attention for more about attention types

Why do they apply RelU layer before GRU cell?

This is the activation after Linear layer. I think tanh was used originally but ReLU became preferrable.
I think the other ReLU after the embeddings in plain Decoder is there by mistake though https://github.com/spro/practical-pytorch/issues/4

the red box in the figure is called a context vector, right?

yes