Search code examples
c++bigdatahdf5

Reading a single vector from HDF5 in C++


I have data stored in hdf5 format, the shape of the data is: (10000, 100), 10000 vectors of 100 floats.

I want to extract the data from the file into c++ vectors, so for this data I would have 10000 vectors where each element is a vector of 100 floats.

I am trying to create a memspace with 1 dimension of 100 elements, then I am trying to read from the file dataset a single row into the memory, but I always get an error:

  #001: ../../../src/H5Dio.c line 487 in H5D__read(): src and dest dataspaces have different number of elements selected

Here is my code:

    H5File fp(... , H5F_ACC_RDONLY);
    DataSet dset = fp.openDataSet("/dataset");
    DataSpace dspace = dset.getSpace();
    hsize_t rank;
    hsize_t dims[2];
    rank = dspace.getSimpleExtentDims(dims, NULL);
    cout<<"Datasize: " << dims[0] << endl;

    // Define the memory dataspace
    hsize_t dimsm[1];
    dimsm[0] = dims[1];
    DataSpace memspace (1, dimsm);

    // create a vector the same size as the dataset
    vector<vector<float>> data;
    data.resize(dims[0]);
    for (hsize_t i = 0; i < dims[0]; i++) {
        data[i].resize(dims[1]);
    }
    //cout<<"Vectsize: "<< data.size() <<endl;

    // Initialize hyperslabs
    hsize_t dataCount[1] = {0,};
    hsize_t dataOffset[1] = {0,};
    hsize_t memCount[1] = {0,};
    hsize_t memOffset[1] = {0,};

    for (hsize_t i = 0; i < dims[0]; i++) {
        dataOffset[0] = i;
        dataCount[0] = dims[1];
        memOffset[0] = 0;
        memCount[0] = dims[1];
        dspace.selectHyperslab(H5S_SELECT_SET, dataCount, dataOffset);
        memspace.selectHyperslab(H5S_SELECT_SET, memCount, memOffset);

        dset.read(data[i].data(), PredType::IEEE_F32LE, memspace, dspace);
        printf("OK %d\n", (int)i);
    }


Solution

  • The dataset dataspace is 2D but you manipulate it with a 1D datacount and offset. Therefore the selectHyperslap method reads garbage beyond the end of the input arrays. Try it like this:

        hsize_t dataCount[2] = {1, dims[1]};
        hsize_t dataOffset[2] = {0, 0};
        const hsize_t memCount[1] = {dims[1]};
        const hsize_t memOffset[1] = {0}; 
        memspace.selectHyperslab(H5S_SELECT_SET, memCount, memOffset);
        for (hsize_t i = 0; i < dims[0]; i++) {
            dataOffset[0] = i;
            dspace.selectHyperslab(H5S_SELECT_SET, dataCount, dataOffset);
            dset.read(data[i].data(), PredType::NATIVE_FLOAT, memspace, dspace);
        }
    

    Some parts are const and don't need to be changed. I'm not even sure you need to select a hyperslab on the memory dataspace. Also, I've changed the output datatype to the native float. You should read in the format of the platform, even if you define datasets as IEEE_F32LE for consistency. HDF5 will handle the conversion.