Search code examples
pythonscikit-learngensimdoc2vechyperparameters

Default values of doc2vec for alpha and min_alpha


can anybody tell me which default values are used in Doc2Vec() for alpha and min_alpha?


Solution

  • The exact defaults for all parameters are listed in the documentation – but might, for parameters shared with a 'base' class, be shown in that superclass's docs.

    So when you don't see alpha and min_alpha shown on the prototype-line of the Doc2Vec documentation....

    https://radimrehurek.com/gensim/models/doc2vec.html#gensim.models.doc2vec.Doc2Vec

    ...you can click the link just under it, where it says...

    Bases: gensim.models.word2vec.Word2Vec

    ...to reach its base class Word2Vec and find those & many more defaults specified:

    https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec

    Specifically, per the text there...

    class gensim.models.word2vec.Word2Vec(sentences=None, corpus_file=None, vector_size=100, alpha=0.025, window=5, min_count=5, max_vocab_size=None, sample=0.001, seed=1, workers=3, min_alpha=0.0001, sg=0, hs=0, negative=5, ns_exponent=0.75, cbow_mean=1, hashfxn=, epochs=5, null_word=0, trim_rule=None, sorted_vocab=1, batch_words=10000, compute_loss=False, callbacks=(), comment=None, max_final_vocab=None)

    ...the defaults are alpha=0.025, min_alpha=0.0001.

    Most users shouldn't need to tinker with these at all: most metaparameter optimization effort should be directed elsewhere.

    In some published work, in some modes of this and related algorithms, I've seen a higher starting alpha of 0.05 or 0.1 used.