Search code examples
python-3.xkerasnlpword-embeddingbert-language-model

How to store Word vector Embeddings?


I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart the kernel, is there any way to save these word embeddings once it is generated? Because, it takes a lot of time to generate those embeddings.

The code I am using to generate BERT Word Embeddings is -

[get_features(text_list[i]) for text_list[i] in text_list]

Here, gen_features is a function which returns word embedding for each i in my list text_list.

I read that converting embeddings into bumpy tensors and then using np.save can do it. But I actually don't know how to code it.


Solution

  • You can save your embeddings data to a numpy file by following these steps:

    all_embeddings = here_is_your_function_return_all_data()
    all_embeddings = np.array(all_embeddings)
    np.save('embeddings.npy', all_embeddings)
    

    If you're saving into google colab, then you can download it to your local computer. Whenever you need it, just upload it and load it.

    all_embeddings = np.load('embeddings.npy')
    

    That's it.

    Btw, You can also directly save your file to google drive.