Search code examples
pythonnumpyramlabview

Memory used by numpy arrays larger than RAM?


I have read very large tdms files containing sensor data into lists of numpy arrays. The structure is the following: The data from several files is stored in instances of an object called file_data. The object has properties for each sensor type which are basically lists of numpy arrays (one for each single sensor of that sensor type).

I wanted to know how much data I store here (since the size of the tdms files generated by Labview seemed not very meaningful, with all the metadata).

This is the code:

# Check memory
total = 0
file_data = [file_data1, file_data2, ...] # list of data objects read from six files
for no, f in enumerate(file_data):
    sensor_types = [f.sensortype1, f.sensortype2, ...] # list of sensor types
    sum = 0
    for sensor_type in sensor_types: # list
        for data in sensor_type: #np.array
            sum += (data.size * data.itemsize)
    total += sum
    print('Data from file {}, size: {:.2f} GB'.format(no+1, sum/(1024**3))) 
print('Total memory: {:.2f} GB'.format(total/(1024**3)))

Now this gives me the following output:

Data from file 1, size: 2.21 GB

Data from file 2, size: 1.88 GB

Data from file 3, size: 2.27 GB

Data from file 4, size: 1.53 GB

Data from file 5, size: 1.01 GB

Data from file 6, size: 0.66 GB

Total memory: 9.56 GB

But I am working on a 8GB RAM Mac, so this number really surprised me, since the program didn't crash and I can work with the data. Where am I mistaken?


Solution

  • I guess you use npTDMS.

    The used numpy.array type is not just a simple array where all array elements are always stored in memory. While the data type and number of elements is known (by reading meta data from the TDMS file, in this case), the elements are not read until requested.

    That is: If you want the last element of a 20GB record, npTDMS knows where it is stored in the file, reads and returns it - without reading the first 20GB.