Search code examples
pythonparent-childpytables

Recommended way for child-parent relationship in pytables


I'm working with pytables and I'm trying to implement a parent-child relationship. For example, I want to store multiple teams, each with multiple players. I can do it in the following way:

import tables as tb



class Team(tb.IsDescription):
    id = tb.Int32Col() #Id of team
    name = tb.StringCol(20) #Name of team


class Player(tb.IsDescription):
    team = tb.Int32Col() #Link to team::team_id
    name = tb.StringCol(20) #Name of player

f = tb.open_file('test.h5',mode='w',title='test')
table_team = f.create_table(f.root,'teams',Team)
table_player = f.create_table(f.root,'players',Player)

team = table_team.row
team['id'] = 0
team['name'] = 'Barcelona'
team.append()

player0 = table_player.row
player0['team'] = 0
player0['name'] = 'De Jong'
player0.append()

player1 = table_player.row
player1['team'] = 0
player1['name'] = 'Fati'
player1.append()

f.close()

However, pytables documentation states the following about this (https://www.pytables.org/cookbook/hints_for_sql_users.html):

"You may have noticed that queries in PyTables only cover one table. In fact, there is no way of directly performing a join between two tables in PyTables (remember that it’s not a relational database)."

It then proceeds to give some workarounds for join-queries. However, as they state, pytables is not a relational database. Therefore, instead of using the relation-based method and using workarounds, I have the following question:

What is the recommended/standard way of implementing a parent-child structure in pytables?


Solution

  • Do you need parent-child relationships for your use case? I think the HDF5 hierarchical data structure will organize your experimental data. Create a different Table for each experiment, with the rows as the datapoints. Experiment metadata is stored as attributes on each table.

    I created a simple example with "dummy data" to demonstrate this schema.

    Note: for simplicity, I prefer to use NumPy to create the Table. First I create a datatype (exp_dt), then use it to create the baseline "experimental data" as a NumPy recarrry (exp_arr). Table data is created by modifying the time and pressure values in exp_arr to create a 2nd array (data). I load data into each table with the obj=data parameter. The example can be modified to create class Experiment(tb.IsDescription) and load the data row-by-row.

    Code below:

    # define table structure with NumPy dtype
    exp_dt = np.dtype( [ ('Time',float),('Temp',float),('Pres',float) ] )
    
    # create baseline dummy data (used later)
    exp_arr = np.empty(shape=(11,), dtype=exp_dt)
    for i in range(11):
        exp_arr[i]['Time'] = i/10.
        exp_arr[i]['Temp'] = i**2/10.
        exp_arr[i]['Pres'] = 2.*i
    
    # create empty recarray; used to load experimental data
    data = np.empty(shape=(11,), dtype=exp_dt)
    
    # create some metadata for experiment date, time and device
    date_list = ['11/17/2021','11/19/2021','11/23/2021']
    time_list = ['10:49:23', '08:14:25', '14:40:23' ]
    device_list = ['Hex 6500', 'Hex 4414', 'CMM 6950']
            
    with tb.File('SO_70082470.h5','w') as h5f:
        for i in range(1,4):
            # create dummy data for THIS experiment
            data['Time'] = exp_arr['Time']
            data['Temp'] = exp_arr['Temp'] + i
            data['Pres'] = exp_arr['Pres'] + 2.*i
            # create table and load data
            tbl = h5f.create_table('/', f'Experiment_{i:03}', obj=data)
            # add 3 attributes: Date, Time and Device:
            tbl.attrs['Date'] = date_list[i-1]
            tbl.attrs['Time'] = time_list[i-1]
            tbl.attrs['Device'] = device_list[i-1]