Hei folks,
I've got a python process which generates matrices. These are stacked up one onto each other and saved as a tensor. Here is the code
import tables
h5file = tables.open_file("data/tensor.h5", mode="w", title="tensor")
atom = tables.Atom.from_dtype(n.dtype('int16'))
tensor_shape = (N, 3, MAT_SIZE, MAT_SIZE)
for i in range(N):
mat = generate(i)
tensor[i, :, :] = mat
The problem is that when it hits 8GB is goes out of memory. Shouldn't the HDF5 format never go out of memory? Like move the data from the memory to the disk when required?
When you are using PyTables the HDF5 file is kept in-memory until the file is closed (see more here: In-memory HDF5 files).
I will recommend you to have a look at the append
and flush
methods of PyTables, as I think that's exactly what you want. Be aware that flushing the buffer for every loop iteration will significantly reduce the performance of your code, due to the constant I/O that needs to be performed.
Also writing the file as chunks (just like when reading data into dataframes in pandas) might spike your interest - See more here: PyTables optimization