Search code examples
python-3.xnumpyhdf5pytables

ValueError: Saving saving in appending mode in Pytables


I have 100 images, each of 85*85 size (width*height), given by numpy array (data) as follows.

import numpy as np
import tables as tb


data = np.random.rand(100, 85, 85)
print (data.shape)

I want to save every image into a h5 file one by one in an appending mode.

fo = "data.h5"

h5 = tb.open_file(fo, mode='w')

group = h5.create_group(h5.root, 'data')

atom = tb.Float64Atom()

ds = h5.create_earray(group, 'test', atom,
                       (0, data.shape[1], data.shape[2]))

for ix in range(data.shape[0]):
    dd = data[ix, :, :]
       
    ds.append(dd)

ds.flush()
ds.close()

However, I got following error:

ValueError: the ranks of the appended object (2) and the /data/test EArray (3) differ


Solution

  • Be careful with your syntax when accessing the data array elements. When you use dd = data[ix, :, :], the dd.shape=(85, 85) You need dd = data[ix:ix+1, :, :] to get a 1 row.

    Loading data row-by-row is not efficient if you have to append a lot of rows. Better to put them in an array and append the entire array. This is shown in the creation of ds2.append(data)

    Here is the updated solution. Note, I prefer with/as to open files for cleaner file error handling.

    with tb.open_file(fo, mode='w') as h5:
        group = h5.create_group(h5.root, 'data')
        atom = tb.Float64Atom() 
        ds = h5.create_earray(group, 'test', atom,
                             (0, data.shape[1], data.shape[2]))
        for ix in range(data.shape[0]):
            dd = data[ix:ix+1, :, :]
            print (dd.shape)   
            ds.append(dd)
    
    # Method to create Earray with parent groups, 
    # then append all image data at one time
        ds2 = h5.create_earray('/data2', 'test2', atom,
                         (0, data.shape[1], data.shape[2]),
                         createparents=True)
        ds2.append(data)
    

    If you want to load all of the data in 1 Earray, it is simple to load using the obj=data parameter referencing your array. This retains the shape definition expandable in dimension 0. See modified code below.

    h5 = tb.open_file(fo, mode='w')
    group = h5.create_group(h5.root, 'data')
    ds = h5.create_earray('/data', 'test', atom,
                         (0, data.shape[1], data.shape[2]),
                         obj=data)
    ds.flush() # not necessary
    ds.close() # not necessary
    h5.close() ## REQUIRED!!!!