Search code examples
nlpgensimword2vec

Gensim - Memory error using GoogleNews-vector model


I getting memory error, when I use GoogleNews-vectors-negative300.bin or try to train a model with Gensim with wikipedia dataset corpus.(1 GB). I have 4GB RAM in my system. Is there any way to bypass this.

Can we host it on cloud service like AWS to get better speed ?


Solution

  • 4GB is very tight for that vector set; you should have 8GB or more to load the full set. Alternatively you could use the optional limit argument to load_word2vec_format() to just load some of the vectors. For example, limit=500000 would load just the first 500,000 (instead of the full 3 million). As the file appears to put the more-frequently-appearing tokens first, that may be sufficient for many purposes.