nlp artificial-intelligence word2vec word-embedding sense2vec

Why are word embeddings with linguistic features (e.g. Sense2Vec) not used?

Given that embedding systems such as Sense2Vec incorporate linguistic features such as part-of-speech, why are these embeddings not more commonly used?

Across popular work in NLP today, Word2Vec and GloVe are the most commonly used word embedding systems. Despite the fact that they only incorporate word information and does not have linguistic features of the words.

For example, in sentiment analysis, text classification or machine translation tasks, it makes logical sense that if the input incorporates linguistic features as well, performance could be improved. Particular when disambiguating words such as "duck" the verb and "duck" the noun.

Is this thinking flawed? Or is there some other practical reason why these embeddings are not more widely used.

Solution

It's a very subjective question. One reason is the pos-tagger itself. Pos-tagger is a probabilistic model which could add to the overall error/confusion.

For eg. say you have dense representations for duck-NP and duck-VB but during run/inference time your pos-tagger tags 'duck' as something else then you wont even find it. Moreover it also effectively reduces the total number of times your system sees the word duck hence one could argue that representations generated would be weak.

To top it off the main problem which sense2vec was addressing is contextualisation of word representations which has been solved by contextual representations like BERT and ElMo etc. without producing any of the above problems.