Search code examples
python-3.xgensimword2vecword-embeddingfasttext

Value of alpha in gensim word-embedding (Word2Vec and FastText) models?


I just want to know the effect of the value of alpha in gensim word2vec and fasttext word-embedding models? I know that alpha is the initial learning rate and its default value is 0.075 form Radim blog.

What if I change this to a bit higher value i.e. 0.5 or 0.75? What will be its effect? Does it is allowed to change the same? However, I have changed this to 0.5 and experiment on a large-sized data with D = 200, window = 15, min_count = 5, iter = 10, workers = 4 and results are pretty much meaningful for the word2vec model. However, using the fasttext model, the results are bit scattered, means less related and unpredictable high-low similarity scores.

Why this imprecise result for same data with two popular models with different precision? Does the value of alpha plays such a crucial role during building of the model?

Any suggestion is appreciated.


Solution

  • The default starting alpha is 0.025 in gensim's Word2Vec implementation.

    In the stochastic gradient descent algorithm for adjusting the model, the effective alpha affects how strong of a correction to the model is made after each training example is evaluated, and will decay linearly from its starting value (alpha) to a tiny final value (min_alpha) over the course of all training.

    Most users won't need to adjust these parameters, or might only adjust them a little, after they have a reliable repeatable way of assessing whether a change improves their model on their end tasks. (I've seen starting values of 0.05 or less commonly 0.1, but never as high as your reported 0.5.)