Search code examples
deep-learninglstmword2vecopennlpword-embedding

How to store word vectors embeddings?


I'm very new to NLP and Deep Learning field and want to understand that after vectorization of a whole corpus using Word2Vec, Do I need to store the word vector values locally? If yes I want to make a chatbot for android. Can anyone please guide me for this?


Solution

  • word2vec embeddings can be saved:

    • in first layers of your deep model. It's rare approach, because in this case you can't use this word2vec for other tasks.
    • as independent file on disk. It's more viable apporach for most use cases.

    I'd suggest to use gensim framework for training of word2vec. Here you can learn more how to train word2vec and save them to disk: https://radimrehurek.com/gensim/models/word2vec.html

    Particularly, saving is performed via:

    model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
    model.save("word2vec.model")
    

    Training of chatbot is much more difficult problem. I can try to suggest you a possible workflow, but you should to clarify what type of chatbot do you have in mind? E.g. should it answer on any question (open domain)? Should it generate answers or it will have predefined answers only?