nlp word2vec text-classification word-embedding

Are the features of Word2Vec independent each other?

I am new to NLP and studying Word2Vec. So I am not fully understanding the concept of Word2Vec.

For example, suppose there is a 100-dimensional word2vec. Then the 100 features are independent each other? In other words, if the "sequence" of the features are shuffled, then the meaning of word2vec is changed?

Solution

Word2vec is a 'dense' embedding: the individual dimensions generally aren't independently interpretable. It's just the 'neighborhoods' and 'directions' (not limited to the 100 orthogonal axis dimensions) that have useful meanings.

So, they're not 'independent' of each other in a statistical sense. But, you can discard any of the dimensions – for example, the last 50 dimensions of all your 100-dimensional vectors – and you still have usable word-vectors. So in that sense they're still independently useful.

If you shuffled the order-of-dimensions, the same way for every vector in your set, you've then essentially just rotated/reflected all the vectors similarly. They'll all have different coordinates, but their relative distances will be the same, and if "going toward word B from word A" used to vaguely indicate some human-understandable aspect like "largeness", then even after performing your order-of-dimensions shuffle, "going towards word B from word A" will mean the same thing, because the vectors "thataway" (in the transformed coordinates) will be the same as before.