Search code examples
hdf5pytables

merging several hdf5 files into one pytable


I have several hdf5 files, each of them with the same structure. I'd like to create one pytable out of them by somehow merging the hdf5 files.

What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in the pytable will be of size x+y, containing first all the entries from file1 and then all the entries from file2.


Solution

  • How you want to do this depends slightly on the data type that you have. Arrays and CArrays have a static size so you need to preallocate the data space. Thus you would do something like the following:

    import tables as tb
    file1 = tb.open_file('/path/to/file1', 'r')
    file2 = tb.open_file('/path/to/file2', 'r')
    file3 = tb.open_file('/path/to/file3', 'r')
    x = file1.root.x
    y = file2.root.y
    
    z = file3.create_array('/', 'z', atom=x.atom, shape=(x.nrows + y.nrows,))
    z[:x.nrows] = x[:]
    z[x.nrows:] = y[:]
    

    However, EArrays and Tables are extendable. Thus you don't need to preallocate the size and can copy_node() and append() instead.

    import tables as tb
    file1 = tb.open_file('/path/to/file1', 'r')
    file2 = tb.open_file('/path/to/file2', 'r')
    file3 = tb.open_file('/path/to/file3', 'r')
    x = file1.root.x
    y = file2.root.y
    
    z = file1.copy_node('/', name='x', newparent=file3.root, newname='z')
    z.append(y)