There is a software I have written for 'machine learning' mission.
To do this, I need to load a lot of data into the RAM of the program (for the required 'fit' function).
In practice, in the spoken run, the 'load_Data' function should return 2 'ndarrays' (from 'numpy' library) of approximately 12,000 to 110,000 size of float64 type.
I get Memory Error during the run.
I tested the program on a smaller dataset (2,000 by 110,000 array) and it does work properly.
There are 2 solutions I have thought about:
1. Use a computer with more RAM (now I am using 8 GB RAM).
2. Use in 'fit' method 10 times, each time on another part of all
dataset.
So, I want to ask:
Is solution #2 is a good solution?
There are more solutions?
Thanks very much.
Of course the first solution is perfectly fine, but rather expensive. But what are you going to do once you have a data set of many hundreds of gigabytes? It's prohibitive for most consumers to purchase that much RAM.
Indeed, batching (as you hinted at) is the most common way to train on really large data sets. Most machine learning toolkits allow you to provide your data in batches. As you have not hinted which one you use, I'll defer to e.g. the Keras documentation on how to set this up.
Edit for scikit-learn
, one can look here for a list of estimators that support batching.