What does "permutation invariant" mean in the context of transformers doing language modelling?
I am reading a research paper called LiLT, their they mentioned transformers are permutation invariant. what is the meaning of permutation invariant in case of language modelling? link to paper