I have a HDF5 file with several Datasets in Groups and sub-Groups. I want to get the Path to a specific Group or Dataset by its name.
A good way to do it is shown in the HDF5 Documentation: https://docs.h5py.org/en/stable/high/group.html
with h5py.File('File.hdf5','r') as hf:
def find_Name (hf):
if 'Name' in hf:
return hf
hf.visit(find_Name)
>>>'Group/subGroup/Name'
The Problem with the solution is, that I can not change the "Name" of the Dataset/Group with each call of hf.visit(find_Name)
How can I define a new String, the function is searching for, with every call?
The following did not work:
with h5py.File('File.hdf5','r') as hf:
def find_Name (hf,Name):
if Name in hf:
return hf
Name = 'NameOfDataset'
hf.visit(find_Name(hf,Name))
Thank you for your support!
Your second attempt fails because the h5py visitor functions (.visit()
and .visititems()
) don't support additional keywords. (In other words, you can't add Name
to the function's argument list.) However, creating a function to mimic their recursive behavior isn't complicated -- you just need to recursively call the same function to descend into any groups you find. (PyTables has functions that support this...but I will save that for a different answer if you are interested.) Take a look at this code:
with h5py.File('File.hdf5','r') as hf:
def check_Name(Name,grp,prefix=''):
for obj_name, obj in grp.items():
path = f'{prefix}/{obj_name}'
if Name == obj_name:
return path
elif isinstance(obj, h5py.Group): # test for group (go down)
gpath = check_Name(Name, obj, path)
if gpath:
return gpath
Name = 'Name'
path = check_Name(Name,hf)
if path:
print(f'{Name} Found: {path}')
else:
print(f'{Name} not Found')
Name = 'NameOfDataset'
path = check_Name(Name,hf)
if path:
print(f'{Name} Found: {path}')
else:
print(f'{Name} not Found')
Output for your schema is:
Name Found: /Group/subGroup/Name
NameOfDataset not Found
Note: This only returns the first occurrence of the input Name
. You will need to convert to a generator function if you need to find multiple occurrences. (FYI, using .visit()
has the same limitation -- it exits on the return.)