Search code examples
hdf5pytablesexternal-links

How to de-reference a list of external links using pytables?


I have created external links leading from one hdf5 file to another using pytables. My question is how to de-reference it in a loop?

for example:

Let's assume file_name = "collection.h5", where external links are stored

I created external links under the root node and when i traverse the nodes under the root, i get the following output :

/link1 (ExternalLink) -> /files/data1.h5:/weights/Image
/link2 (ExternalLink) -> /files/data2.h5:/weights/Image

and so on,

I know that for de-referencing a link, it can be done like this, using natural naming in the below manner:

f = open_file('collection.h5',mode='r')
plink1 = f.root.link1()
plink2 = f.root.link2()

but I want to do this in a for-loop, any help regarding this?


Solution

  • This is a more complete (robust and complicated) answer to handle the general condition when you have an ExternalLink at any group level. It is similar to above, but uses walk_nodes() because it has 3 groups at the root level, and includes a test for ExternalLink types (see isinstance()). Also, it shows how to use the _v_children attribute to get a dictionary of nodes. (I couldn't get list_nodes() to work with an ExternalLink.)

    import tables as tb
    import glob
    
    h5f = tb.open_file('collection.h5',mode='w')
    link_cnt = 0
    pre_list = ['SO_53', 'SO_54', 'SO_55']
    for h5f_pre in pre_list :
        h5f_pre_grp = h5f.create_group('/', h5f_pre)
        for h5name in glob.glob('./'+h5f_pre+'*.h5'):
            link_cnt += 1
            h5f.create_external_link(h5f_pre_grp, 'link_'+'%02d'%(link_cnt), h5name+':/')
    h5f.close()
    
    h5f = tb.open_file('collection.h5',mode='r')
    for link_node in h5f.walk_nodes('/') : 
        if isinstance(link_node, tb.link.ExternalLink) :
            print('\nFor Node %s:' % (link_node._v_pathname) )
            print("``%s`` is an external link to: ``%s``" % (link_node, link_node.target))
            plink = link_node(mode='r') # this returns a file object for the linked file
            linked_nodes = plink._v_children
            print (linked_nodes)
    
    h5f.close()