Search code examples
pythonhdf5pytables

How to extract data from HDF5 file to fill PyTables table?


I am trying to write a Discord bot in Python. Goal of that bot is to fill a table with entries from users, where are retrieved username, gamename and gamepswd. Then, for specific users to extract these data and remove the solved entry. I took first tool found on google to manage tables, therefore PyTables, I'm able to fill a table in a HDF5 file, but I am unable to retrieve them.

Could be important to say I never coded in Python before.

This is how I declare my object and create a file to store entries.

class DCParties (tables.IsDescription):
    user_name=StringCol(32)
    game_name=StringCol(16)
    game_pswd=StringCol(16) 


h5file = open_file("DCloneTable.h5", mode="w", title="DClone Table")    
group = h5file.create_group("/", 'DCloneEntries', "Entries for DClone runs")
table = h5file.create_table(group, 'Entries', DCParties, "Entrées")
h5file.close()

This is how I fill entries

h5file = open_file("DCloneTable.h5", mode="a")
    table = h5file.root.DCloneEntries.Entries
    
    particle = table.row
    particle['user_name'] = member.author
    particle['game_name'] = game_name
    particle['game_pswd'] = game_pswd
    particle.append()
    
    table.flush()
    h5file.close()

All these work, and I can see my entries fill the table in the file with an HDF5 viewer. But then, I wish to read my table, stored in the file, to extract datas, and it's not working.

h5file = open_file("DCloneTable.h5", mode="a")
    table = h5file.root.DCloneEntries.Entries
    
    particle = table.row
    
    """???"""
    
    h5file.close()

I tried using particle["user_name"] (because 'user_name' isn't defined), it gives me "b''" as output

h5file = open_file("DCloneTable.h5", mode="a")
    table = h5file.root.DCloneEntries.Entries
    
    particle = table.row
    print(f'{particle["user_name"]}')
    
    h5file.close()

b''

And if I do

h5file = open_file("DCloneTable.h5", mode="a")
    table = h5file.root.DCloneEntries.Entries
    
    particle = table.row
    print(f'{particle["user_name"]} - {particle["game_name"]} - {particle["game_pswd"]}')
    
    h5file.close()

b'' - b'' - b''

Where am I failing ? Many thanks in advance :)


Solution

  • Here is a simple method to iterate over the table rows and print them one at time. HDF5 doesn't support Unicode strings, so your character data is stored as byte strings. That's why you see the 'b'. To get rid of the 'b', you have to convert back to Unicode using .decode('utf-8'). This works with your hard coded field names. You could use the values from table.colnames to handle any column names. Also, I recommend using Python's file context manager (with/as:) to avoid leaving a file open.

    import tables as tb
    
    with tb.open_file("DCloneTable.h5", mode="r") as h5file:
        table = h5file.root.DCloneEntries.Entries
        print(f'Table Column Names: {table.colnames}')
    
    # Method to iterate over rows
        for row in table:
            print(f"{row['user_name'].decode('utf-8')} - " +
                  f"{row['game_name'].decode('utf-8')} - " +
                  f"{row['game_pswd'].decode('utf-8')}" )
    
    # Method to only read the first row, aka table[0]
        print(f"{table[0]['user_name'].decode('utf-8')} - " +
              f"{table[0]['game_name'].decode('utf-8')} - " +
              f"{table[0]['game_pswd'].decode('utf-8')}" )
    

    If you prefer to read all the data at one time, you can use the table.read() method to load the data into a NumPy structured array. You still have to convert from bytes to Unicode. As a result it is "slightly more complicated", so I didn't post that method.