deep-learning nlp text-processing word-embedding

Better way to combine Word embedding to get embedding of a sentence

I have seen in many kaggle kernels and tutorials, average word embeddings is considered to get embedding of a sentence. But, i am wondering if this is a correct approach.Since it discards the positional information of the words in the sentence. is there a better way to combine embedding? maybe hierarchically combining them in a particular way?

Solution

If you need a simple but yet effective approach, Sif embedding is perfectly fine. It averages word vector in a sentence and removes its first principal component. It is much superior to averaging word vectors. The code available online here. Here is the main part:

svd = TruncatedSVD(n_components=1, random_state=rand_seed, n_iter=20)
svd.fit(all_vector_representation)
svd = svd.components_

XX2 = all_vector_representation - all_vector_representation.dot(svd.transpose()) * svd

Where all_vector_representation is the average embedding of all sentences in your dataset.

Other sophisticated approaches also exist out there like ELMO, Transformer and etc.