Broadly speaking the training of word2vec is a process in which words that are often in the same context are clustered together in the vector space. We start by randomly shuffling the words on the plane and then with each iteration more and more clusters form. I think I understood this but how can we assure that the words that are antonyms or rarely appear in the same context don't end up in clusters that are close by? Also how can we know that words that are more irrelevant are farther away than word that are less irrelevant.
To elaborate somewhat on Novak's response:
You seem to regard word2vec
as a tool to evaluate semantic meaning. Although much of the result is correlated with meaning, that is not the functionality of word2vec
. Rather, it indicates contextual correlation, which is (somewhat loosely) regarded as "relevance".
When this "relevance" is applied to certain problems, especially when multiple "relevance" hits are required to support a reportable result, then the overall effect is often useful to the problem at hand.
For your case, note that a word and its antonym will often appear near one another, for literary contrast or other emphasis. As such, they are contextually quite relevant to one another. Unless you have some pre-processing that can identify and appropriately alter various forms of negation, you will see such pairs often in your vectorization -- as is appropriate to the tool.