I just want to know the effect of the value of alpha in gensim word2vec
and fasttext
word-embedding models? I know that alpha is the initial learning rate
and its default value is 0.075
form Radim blog.
What if I change this to a bit higher value i.e. 0.5 or 0.75? What will be its effect? Does it is allowed to change the same? However, I have changed this to 0.5 and experiment on a large-sized data with D = 200, window = 15, min_count = 5, iter = 10, workers = 4 and results are pretty much meaningful for the word2vec model. However, using the fasttext model, the results are bit scattered, means less related and unpredictable high-low similarity scores.
Why this imprecise result for same data with two popular models with different precision? Does the value of alpha
plays such a crucial role during building of the model?
Any suggestion is appreciated.
The default starting alpha
is 0.025
in gensim's Word2Vec implementation.
In the stochastic gradient descent algorithm for adjusting the model, the effective alpha
affects how strong of a correction to the model is made after each training example is evaluated, and will decay linearly from its starting value (alpha
) to a tiny final value (min_alpha
) over the course of all training.
Most users won't need to adjust these parameters, or might only adjust them a little, after they have a reliable repeatable way of assessing whether a change improves their model on their end tasks. (I've seen starting values of 0.05
or less commonly 0.1
, but never as high as your reported 0.5
.)