Search code examples
pythonvisual-studio-codehdf5h5pypython-3.11

h5py: HDF5 dataset name conflicts with group name


Note: This is @hamo's original question.
The question was updated/clarified as an "answer" here: https://stackoverflow.com/a/77574693/10462884.
The associated answer is here: https://stackoverflow.com/a/77575182/10462884


This code has had me working for hours without coming to a solution. The program does not find the path in the file, so it creates a new dataset. However then it throws the following TypeError():

TypeError: "Incompatible object (Dataset) already exists"

If I try to just update the value with via dset = file[hdf5_path], it throws this error:

Unable to open object (message type not found)

This code generates the problem above:

    hdf5_path = "path/to/my/dataset"
    with h5py.File(hdf5_path, "r+") as file:
        if hdf5_path in file:
            dset = file[hdf5_path]
            dset[...] = new_data_set
        else:
            file.create_dataset(hdf5_path, data=pixel_count)

The following code instead generates an error the second time in the loop, i.e. creating the group "my_path/to_another": "Unable to create group (message type not found)"

    import h5py
    data_set = [1,2,3,4]
    fname = "temp.h5"
    h5_path = "my_path"
    with h5py.File(fname, "r+") as file:
        if h5_path in file:
            dset = file[h5_path] 
            dset[...] = data_set
        else:
            file.create_dataset(h5_path, data=data_set)

    h5_path = "my_path/to_another/dest"
    with h5py.File(fname, "r+") as file:
        current_path = ""
        for part in h5_path.split('/')[:-1]:
            current_path = f"{current_path}/{part}" if current_path else part
            if current_path not in file:
                file.create_group(current_path)
        if h5_path in file:
            dset = file[h5_path] 
            dset[...] = data_set
        else:
            file.create_dataset(h5_path, data=data_set)

Could it be that the files are corrupted?


Solution

  • This answer is a follow-up to @hamo's answer with "purported issue". As you discovered, you have to check if the file[dset_tag] is a dataset, but also have to check all names along the path are groups (except the last one used for the dataset). The logic gets a little trickier. I addressed it by creating a simple function to check that all path names are NOT datasets. Then you can use it in the original logic. I modified your example to create a file that mimics your problem. The file is 1st created with a dataset named post/cams/thermal then tries to create a dataset named post/cams/thermal/pixels.

    Code below:

    def group_path_ok(pset_path):
        pset_path = dset_tag.split('/')
        group_path = ''
        for name in pset_path[:-1]:
            group_path += '/' + name
            if group_path in file and isinstance(file[group_path], h5py.Dataset):
                print(f'group name: {group_path} in path is a dataset')
                return False
        return True
                
        
    fname = "copyfile.h5"
    pixel_count = [i for i in range(10)]
    dset_tag = "post/cams/thermal"
    
    with h5py.File(fname, "w") as file:
        file.create_dataset(dset_tag, data=pixel_count)
    
    pixel_count = [i for i in range(17)]
    dset_tag = "post/cams/thermal/pixels"   
    with h5py.File(fname, "r+") as file:
        if group_path_ok(pset_path):
            if dset_tag in file:
                del file[dset_tag]
                print("Dataset deleted")
            file.create_dataset(dset_tag, data=pixel_count)