Search code examples
matlabout-of-memorymat-file

matlab out of memory ultimate solution


I have a really large file, around 10GB. I can't load it to the memory, so I managed to transfer it to .mat file. But 'out of memory' problem still comes up when I tried clustering. The ultimate solution to it I think is put those memory thing to the disk. But I need to call kmeans() method from matlab. Is there a way to put the local variables in the kmeans to the disk as well without rewriting the method?


Solution

  • You need a strategy to deal with large data sets. Possibilities are:

    1. Use a system with enough memory
    2. Reduce precision of your data set. For clustering small errors and scaling are not important, change attributes to scaled uint8 or uint16 if possible. (And obviously, delete all irrelevant data)
    3. Use more appropriate algorithms. I'm not an expert in this field, but CLARA and CLARANS are two alternatives. These algorithms don't require only a subset of the data, should be possible to combine with matfile to keep only the relevant parts in memory.