The positional encoding in official transformer's release is different from the original paper

In the original paper Attention is all you need, the positional encoding is defined as: pe

but in Transformer's model_utils.py, I found that the formula is different at line 53. In the paper, the sin and cos functions appear alternately according to even or single dimension, while they are continuous in the half of the dimension respectively.

Solution

You are right, but I don't think that makes any difference. The representation of each position with the positional encoding is unique no matter you concatenate sin/cos or make them alternately appear in the final vector.

As long as the encoding is unique and we always generate the encoding consistently, the positional information is preserved in the model.