I have multiple h5py files(pixel-level annotations) for one image. Image Masks are stored in hdf5 files as key-value pairs with the key being the id of some class. The masks (hdf5 files) all match the dimension of their corresponding image and represent labels for pixels in the image. I need to compare all the h5 files with one another and find out the final mask that represents the majority. But I don't know how to compare multiple h5 files in python. Can someone kindly help?
What do you mean by "compare"?
If you just want to compare the files to see if they are the same, you can use the h5diff
utility from The HDF5 Group. It comes with the HDF5 installer. You can get more info about h5diff here: h5diff utility. Links to all HDF5 utilities are at the top of the page:HDF5 Tools
It sounds like you need to do more that that. Please clarify what you mean by "find out the final mask that represents the majority". Do you want to find the average image size (either mean, median, or mode)? If so, it is "relatively straight-forward" (if you know Python) to open each file and get the dimension of the image data (the shape of each dataset -- what you call the values). For reference, the key, value
terminology is how h5py refers to HDF5 dataset names
and datasets
.
Here is a basic outline of the process to open 1 HDF5 file and loop thru the datasets (by key name) to get the dataset shape (image size). For multiple files, you can add a for
loop using the iglob
iterator to get the HDF5 file names. For simplicity, I saved the shape values to 3 lists and manually calculated the mean (sum()/len()
). If you want to calculate the mask differently, I suggest using NumPy arrays. It has mean and median functions built-in. For mode, you need scipy.stats
module (it works on NumPy arrays).
Method 1: iterates on .keys()
s0_list = []
s1_list = []
s2_list = []
with h5py.File(filename,'r')as h5f:
for name in h5f.keys() :
shape = h5f[name].shape
s0_list.append(shape[0])
s1_list.append(shape[1])
s2_list.append(shape[2])
print ('Ave len axis=0:',sum(s0_list)/len(s0_list))
print ('Ave len axis=1:',sum(s1_list)/len(s1_list))
print ('Ave len axis=2:',sum(s2_list)/len(s2_list))
Method 2: iterates on .items()
s0_list = []
s1_list = []
s2_list = []
with h5py.File(filename,'r')as h5f:
for name, ds in h5f.items() :
shape = ds.shape
s0_list.append(shape[0])
s1_list.append(shape[1])
s2_list.append(shape[2])
print ('Ave len axis=0:',sum(s0_list)/len(s0_list))
print ('Ave len axis=1:',sum(s1_list)/len(s1_list))
print ('Ave len axis=2:',sum(s2_list)/len(s2_list))