I have read very large tdms files containing sensor data into lists of numpy arrays. The structure is the following: The data from several files is stored in instances of an object called file_data. The object has properties for each sensor type which are basically lists of numpy arrays (one for each single sensor of that sensor type).
I wanted to know how much data I store here (since the size of the tdms files generated by Labview seemed not very meaningful, with all the metadata).
This is the code:
# Check memory
total = 0
file_data = [file_data1, file_data2, ...] # list of data objects read from six files
for no, f in enumerate(file_data):
sensor_types = [f.sensortype1, f.sensortype2, ...] # list of sensor types
sum = 0
for sensor_type in sensor_types: # list
for data in sensor_type: #np.array
sum += (data.size * data.itemsize)
total += sum
print('Data from file {}, size: {:.2f} GB'.format(no+1, sum/(1024**3)))
print('Total memory: {:.2f} GB'.format(total/(1024**3)))
Now this gives me the following output:
Data from file 1, size: 2.21 GB
Data from file 2, size: 1.88 GB
Data from file 3, size: 2.27 GB
Data from file 4, size: 1.53 GB
Data from file 5, size: 1.01 GB
Data from file 6, size: 0.66 GB
Total memory: 9.56 GB
But I am working on a 8GB RAM Mac, so this number really surprised me, since the program didn't crash and I can work with the data. Where am I mistaken?
I guess you use npTDMS.
The used numpy.array
type is not just a simple array where all array elements are always stored in memory.
While the data type and number of elements is known (by reading meta data from the TDMS file, in this case), the elements are not read until requested.
That is: If you want the last element of a 20GB record, npTDMS knows where it is stored in the file, reads and returns it - without reading the first 20GB.