Search code examples
rperformancemachine-learningsparse-matrixlarge-data

Big data memory issue in R


I've created a tdm matrix in R which I want to write to a file. This is a large sparse matrix in simple triplet form, ~ 20,000 x 10,000. When I convert it to a dense matrix to add columns by cbind, I get low memory errors and the process does not complete. I don't want to increase my RAM.

Also, I want to - - bind the tf and tfidf matrices together - save the sparse/dense matrix to csv - run batch machine learning algorithms such as J48 implementation of weka.

How do I save/ load dataset and run the batch ML algorithms within memory constraints?

If I can write a sparse matrix to a data store, can I run ml algorithms in R on a sparse matrix, and within memory constraints?


Solution

  • There could be several solutions:

    1) Convert your matrix from double to integer, if you are dealing with integer numbers. Integers needs less memory comparing to double numbers.

    2) Try the bigmemory package.