Search code examples
pythonnumpyscipyhdf5pytables

Storing numpy sparse matrix in HDF5 (PyTables)


I am having trouble storing a numpy csr_matrix with PyTables. I'm getting this error:

TypeError: objects of type ``csr_matrix`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or tuple, integer, float, complex or string

My code:

f = tables.openFile(path,'w')

atom = tables.Atom.from_dtype(self.count_vector.dtype)
ds = f.createCArray(f.root, 'count', atom, self.count_vector.shape)
ds[:] = self.count_vector
f.close()

Any ideas?

Thanks


Solution

  • A CSR matrix can be fully reconstructed from its data, indices and indptr attributes. These are just regular numpy arrays, so there should be no problem storing them as 3 separate arrays in pytables, then passing them back to the constructor of csr_matrix. See the scipy docs.

    Edit: Pietro's answer has pointed out that the shape member should also be stored