I getting memory error, when I use GoogleNews-vectors-negative300.bin or try to train a model with Gensim with wikipedia dataset corpus.(1 GB). I have 4GB RAM in my system. Is there any way to bypass this.
Can we host it on cloud service like AWS to get better speed ?
4GB is very tight for that vector set; you should have 8GB or more to load the full set. Alternatively you could use the optional limit
argument to load_word2vec_format()
to just load some of the vectors. For example, limit=500000
would load just the first 500,000 (instead of the full 3 million). As the file appears to put the more-frequently-appearing tokens first, that may be sufficient for many purposes.