Process of extracting data,
I am analyzing 4000 to 8000 DICOM files using matlab codes. DICOM files are read using dicomread()
function. Each DICOM file contains 932*128 photon count data coming from 7 detectors. While reading DICOM files, I convert data into double and stored in 7 cell array
variables (from seven detectors). So each cell contains 128*128 photon counting data and cell array contain 4000 to 8000 cells.
Question.
When I save each variable separately, size of each variable is 3GB. So for 7 variables it will be 21GB, Saving them and reading back takes awful lot of time. (RAM of my computer is 4GB) Is there a way to reduce the size of variable?
Thanks.
Different data type will help. You can save data as float instead of double, as DICOM files have it as float too (from http://northstar-www.dartmouth.edu/doc/idl/html_6.2/DICOM_Attributes.html; Graphic Data). This halves size at no loss. You might want to expand to double when doing operations on data to avoid inaccuracies creeping up. Additional compression by saving it as uint16 (additional x2 space saving) or even uint8 (x4) might be possible, but I would be wary of this - it might work great in all test cases but make problems when you least expect it.
Cell array is not problematic in terms of speed or size - you will not gain (much) by switching to something else. Your data gobbles up memory, not the cell array itself. If you wish, you can save data in a 128x128x7x8000 float array - it should work just fine too. But if the number of images (this 4000-8000) can increase at any point, rescaling the array will be a pretty costly operation in terms of space and time. Cell arrays are much easier to extend - 8k values to move around instead of 8k*115k=900M values.
Another option is to separate data in chunks. You probably don't need to be working on all 4000 images at once. You can load 500 images, finish your work on them, move on to next 500 images etc. Batch size obviously depends on your hardware and what processing you do with data, but I guess about 500 could be a pretty reasonable starting point.