I have trained a doc2vec
model with gensim
and like to import it into Deeplearning4j
in order to deploy that model.
For word2vec
models, I know that this is possible by saving the model with
model.wv.save_word2vec_format("word2vec.bin", binary=True)
and importing if in Java with
Word2Vec w2vModel = WordVectorSerializer.readWord2VecModel("word2vec.bin");
Is there a similar way to import a doc2vec
model?
The save_word2vec_format()
method saves just the word-vectors, not the full model.
If you were to use Gensim's .save()
to save the full model, it'd use Python's native serialization - so any Java code to read it would have to understand that format before rearranging relevant properties into the DL4J objects.
I don't see anything in the docs for DL4J's ParagraphVectors
class docs suggesting it can read Gensim-formatted models, so I doubt there's any built-in support.
It's theoretically possible that some Python code could be written to dump all the relevant subparts of the model in forms amenable to reading in Java, then patching into a Dl4J model, or for Java code to be written to understand the Python serialized objects – but that'd require some familiarity with both the Gensim & DL4J source code.
(If the toJson()
& fromJson()
methods in DL4J work with full model representations – which isn't clear from the docs, and would be an extremely bloated format for the bulk of the model state – that'd likely make the model-translation a little easier, as it'd provide a straightforward template for what some new Python code would need to write-out.)