I have never worked with HDF5 files before, and to get started I received some example files. I've been checking out all the basics with h5py
, looking at the different groups in these files, their names, keys, values and so on. Everything works fine, until I want to look at the datasets that are saved in the groups. I get their .shape
and .dtype
, but when I try accessing a random value by indexing (e.g. grp["dset"][0]
), I get the following error:
IOError Traceback (most recent call last)
<ipython-input-45-509cebb66565> in <module>()
1 print geno["matrix"].shape
2 print geno["matrix"].dtype
----> 3 geno["matrix"][0]
/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.pyc in __getitem__(self, args)
443 mspace = h5s.create_simple(mshape)
444 fspace = selection._id
--> 445 self.id.read(mspace, fspace, arr, mtype)
446
447 # Patch up the output for NumPy
/home/sarah/anaconda/lib/python2.7/site-packages/h5py/h5d.so in h5py.h5d.DatasetID.read (h5py/h5d.c:2782)()
/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_proxy.so in h5py._proxy.dset_rw (h5py/_proxy.c:1709)()
/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_proxy.so in h5py._proxy.H5PY_H5Dread (h5py/_proxy.c:1379)()
IOError: Can't read data (Can't open directory)
I've posted this problem in the h5py Google group, where it was suggested that there might be a filter on the dataset I don't have installed. But the HDF5 file was created using only gzip compression, which should be a portable standard, as far as I understood.
Does someone know what I might be missing here? I can't even find a description of this error or similar problems anywhere, and the file, including the problematic dataset, can be easily opened with the HDFView software.
Edit
Apparently, this error occurs because, for some reason, the gzip compression filter is not available on my system. If I try to create an example file with gzip compression, this happens:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-33-dd7b9e3b6314> in <module>()
1 grp = f.create_group("subgroup")
----> 2 grp_dset = grp.create_dataset("dataset", (50,), dtype="uint8", chunks=True, compression="gzip")
/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/group.pyc in create_dataset(self, name, shape, dtype, data, **kwds)
92 """
93
---> 94 dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
95 dset = dataset.Dataset(dsid)
96 if name is not None:
/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/dataset.pyc in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times)
97
98 dcpl = filters.generate_dcpl(shape, dtype, chunks, compression, compression_opts,
---> 99 shuffle, fletcher32, maxshape, scaleoffset)
100
101 if fillvalue is not None:
/home/sarah/anaconda/lib/python2.7/site-packages/h5py/_hl/filters.pyc in generate_dcpl(shape, dtype, chunks, compression, compression_opts, shuffle, fletcher32, maxshape, scaleoffset)
101
102 if compression not in encode:
--> 103 raise ValueError('Compression filter "%s" is unavailable' % compression)
104
105 if compression == 'gzip':
ValueError: Compression filter "gzip" is unavailable
Does anyone have experience with that? The installation of the HDF5 library as well as the h5py package didn't seem to go wrong...
Can't just comment - reputation too low.
I had the same issue, simply ran "conda update anaconda" and the problem is gone.