Search code examples
deeplearning4j

deeplearning4j: online Word2Vec training


Word2vec is a great tool is deeplearning4j. I managed to create a vector for a corpus following this tutorial.

The question now is how to update the model with new sentences without having to rebuild it again from scratch.

Some thoughts on this, would this method helps?

public void trainSentence(List<VocabWord> sentence){}

Would that update the model? If yes, how to prepare the sentence to be sent to this method?


Solution

  • Yes and no. In the documentation here, it mentions:

    Weights update after model serialization/deserialization was added. That is, you can update model state with, say, 200GB of new text by calling loadFullModel, adding TokenizerFactory and SentenceIterator to it, and calling fit() on the restored model.

    This means that the model weights could be retrained and updated with new corpus. But no new words will be added to the vocab.

    Check code and Javadoc here.