I have a very large HDF5 file and wish to read a small subset of it using FORTRAN. My attempts thus far have failed and I'm confused by the documentation. Any pointers you could give to a FORTRAN newbie (but reasonable C/python coder) would be much appreciated.
In particular I'm having real difficulty understanding what the dataspace and memory space are, in my code they don't seem to be doing what I expect based upon the documentation I've read. That's probably my own idiocy though!
This is what I am trying:
integer, allocatable :: tmpdata(:,:) ! Array to contain data subset
integer(HID_T) :: fid ! HDF5 File ID
integer(HID_T) :: did ! Dataset ID
integer :: error ! Error variable
integer(HSIZE_T), dimension(1:2) :: count ! Number of px to read (x,y)
integer(HSIZE_T), dimension(1:2) :: offset ! Starting point for read (x,y)
integer(HID_T) :: dataspace ! Dataspace
integer(HID_T) :: memspace ! Memoryspace
offset=(/58000,22000/) ! Set offset in 2d dataset
count=(/1200,1200/) ! Set # pixels to read (1200x1200 slab)
allocate(tmpdata(1200,1200)) ! Allocate space to store this slab
call h5open_f(error)
call h5fopen_f ("myfile.h5", H5F_ACC_RDWR_F, fid, error) ! Open HDF5 file
call h5dopen_f(fid, "mydataset", did, error) ! Open dataset
call h5dget_space_f(did, dataspace, error) ! Retrieve dataspace
call h5screate_simple_f(2, count, memspace, error) !Create memspace, rank=2,size=1200x1200
call h5sselect_hyperslab_f(dataspace, H5S_SELECT_SET_F, offset, count, error) ! Select slab in the data
call h5dread_f(did, H5T_NATIVE_INTEGER, tmpdata, dimsm, error,memspace,dataspace) ! Read the data from the HDF5 file into the tmpdata array
! Close everything
-snip-
Everything goes OK up until the h5dread_f call. Then I get a segfault. If I set tmpdata equal to the size of the actual dataset in the HDF5 file then it works, but this isn't a good solution as for some files the dataset will be too large to store in memory. Any ideas? Hopefully I'm simply doing something dumb. In case it's important I'm compiling with ifort and HDF5-1.8.15 Patch 1 on Ubuntu 14.04
Based on your description and omission of the variable dimsm
, I'm going to guess you have that variable set to the full dimensions of the variable you are reading. Those dimensions should instead be the dimensions of the hyperslab you set up, which are also the dimensions of the variable you are holding the results of the read in tmpdata
. You can use the same array that you are using to create the hyperslab, count
in the call to read.
Change:
call h5dread_f(did, H5T_NATIVE_INTEGER, tmpdata, dimsm, error,memspace,dataspace)
to
call h5dread_f(did, H5T_NATIVE_INTEGER, tmpdata, count, error,memspace,dataspace)
And your read should work.
What was happening (if my assumption is correct) is that dimsm
contains the values of the full dataset dimensions (or anything greater than 1200 in both dimensions) and so the call to read the data attempts to read data outside the bounds of the dataspace (which was set to a 1200x1200 view at your given offsets) and if that isn't the segfault, then it would be trying to put the larger than 1200x1200 read into the 1200x1200 variable tmpdata
.