Search code examples
pythonhdf5h5pypytables

How to copy a dataset object to a different hdf5 file using pytables or h5py?


I have selected specific hdf5 datasets and want to copy them to a new hdf5 file. I could find some tutorials on copying between two files, but what if you have just created a new file and you want to copy datasets to the file? I thought the way below would work, but it doesn't. Are there any simple ways to do this?

>>> dic_oldDataset['old_dataset']
<HDF5 dataset "old_dataset": shape (333217,), type "|V14">

>>> new_file = h5py.File('new_file.h5', 'a')
>>> new_file.create_group('new_group')

>>> new_file['new_group']['new_dataset'] = dic_oldDataset['old_dataset']


RuntimeError: Unable to create link (interfile hard links are not allowed)

Solution

  • Answer 1 (using h5py):
    This creates a simple structured array to populate the first dataset in the first file. The data is then read from that dataset and copied to the second file using my_array.

    import h5py, numpy as np
    
    arr = np.array([(1,'a'), (2,'b')], 
          dtype=[('foo', int), ('bar', 'S1')]) 
    print (arr.dtype)
    
    h5file1 = h5py.File('test1.h5', 'w')
    h5file1.create_dataset('/ex_group1/ex_ds1', data=arr)                
    print (h5file1)
    
    my_array=h5file1['/ex_group1/ex_ds1']
    
    h5file2 = h5py.File('test2.h5', 'w')
    h5file2.create_dataset('/exgroup2/ex_ds2', data=my_array)
    print (h5file2)
    
    h5file1.close()
    h5file2.close()