Search code examples
pythongensimword2vechyperparameters

How to get back hyperparameters from a trained world2vec model gensim?


I have a trained word2vec model which I need to train further with more data. I want to use the same hyperparameters that is used while training the model for the new model as well. But I don't want to hardcode it. Is there a method which I can use to get the hyperparameters used while training the existing model. I am using Gensim word2vec.


Solution

  • Any full Word2Vec model has every metaparameter that was supplied at its initial creation somewhere in its object properties.

    It's almost always on the model itself, using the exact same name as was used for the constructor parameters. So, model.window will return the window, etc - and thus you can just create a new model pulling each value from the old model.

    Note that continuing training on an already-trained model involves a lot of thorny tradeoffs.

    For example, the .build_vocab(..., update=True) on an existing model won't be applying min_count consistently against all word totals from all prior calls, but only those in the latest 'batch'.

    The proper learning-rate (alpha to min_alpha values) for incremental updates isn't well-defined by theory or rules-of-thumb. And if the vocabulary & word-balance in the new texts mainly train some words, not all, those recently-updated words can be pulled arbitrarily out of strong-comparability-alignment with earlier words that didn't get more training. (The underlying method of mdoel optimization, stochastic gradient descent, is best-grounded when all training texts are given equal training attention, without any subset begin intensely trained later.)