Search code examples
python-3.xnumpyappenddata-sciencepytables

Automated creation of multiple datasets in Python-Pytables


In my script, I create several datasets manually:

    import tables
    dset1 = f.create_earray(f.root, "dataset1", atom=tables.Float64Atom(), shape=(0, 2))
    dset2 = f.create_earray(f.root, "dataset2", atom=tables.Float64Atom(), shape=(0, 2))
    dset3 = f.create_earray(f.root, "dataset3", atom=tables.Float64Atom(), shape=(0, 2))
    ...

I want to achieve two things:

  1. Automate the above statements to execute in a loop fashion and create any desired (N) datasets

  2. Then I also use .append method sequentially (as given below) which I also want to automate:

     dset1.append(np_array1) 
     dset2.append(np_array2) 
     dset3.append(np_array3) 
     ...
    

Will appreciate any assistance?


Solution

  • It's hard to provide specific advice without more details. If you already have the NumPy arrays, you can create the EArray with the data in a single call (using the obj= parameter). Here's a little code snippet that shows how do do this in a loop.

    import tables as tb
    import numpy as np
    
    with tb.File('SO_64397597.h5','w') as h5f:
        
        arr1  = np.ones((10,2))
        arr2  = 2.*np.ones((10,2))
        arr3  = 3.*np.ones((10,2))
        arr_list = [arr1, arr2, arr3]
        for cnt in range(1,4):
            h5f.create_earray("/", "dataset"+str(cnt), obj=arr_list[cnt-1])
    

    The code above doesn't create dataset objects. If you need them, you can access programmatically with this call:

    # input where as path to node, name not required
    ds = h5f.get_node("/dataset1")
    # or
    # input where as path to group, and name as dataset name
    ds = h5f.get_node("/","dataset1") 
    

    If you don't have the arrays when you create the datasets, you can create the EArrays in the first loop, then add the np.array data in a second loop. See below:

    with tb.File('SO_64397597.h5','w') as h5f:
        
        for cnt in range(1,4):
            h5f.create_earray("/", "dataset"+str(cnt), atom=tables.Float64Atom(), shape=(0, 2))
    
            # get array data...
            arr_list = [arr1, arr2, arr3]
            # add array data
            for cnt in range(1,4):
                h5f.get_node("/","dataset"+str(cnt)).append(arr_list[cnt-1])