math machine-learning nlp deep-learning word2vec

What's the maximum euclidean distance between 2 hyperpoints in word2vec algorithm?

I have been considering to use Word2vec for a problem. I know that you can use cosine distance which means the minimum distance can be 0 if the hyperpoints are identical or 1 because cosine spans from [-1,1] in case of maximum. The same applies for minimum in euclidean distance. My question is in practice what is the maximum euclidean distance two said words can achieve while using word2vec to project them in the same hyperspace ? Can it be estimated mathematically ? is it theoretically infinite ?

Solution

The training process doesn't necessarily bound where a word-vector winds up, so I believe the euclidean-distance between two words could become arbitrarily large.

But, they'd only get arbitrarily large with arbitrarily many training passes, and perhaps only on certain extreme training corpuses. The normal variety of language and limited number of training passes means that in practice vectors don't get too far from the origin point.

It's common to unit-normalize the word-vectors, so that they all have a magnitude of 1.0 (and thus are on the "unit-hypersphere"), before making word-to-word comparisons. If you've done this normalization:

while the euclidean-distance and cosine-distance will be different values, the rank order of nearest-neighbors will be the same no matter which you use
the maximum distance between any two vectors would be 2, for points diametrically opposite each other on the hypersphere