Search code examples
hdf5

Create hdf5 file from scratch using file image operations (memory mapped hdf5 files)


Problem: I want to use memory mapped HDF5 files for our unit tests. Is it possible to create them from scratch?

Status: I've read up on the HDF5 file image operations document, and tried to apply it. Depending on the exact parameters used, I get an invalid file identifier (-1), or subsequent creation of datasets fail.

Typically our unit tests write new test files mimicking users saving newly created data to a file on disk. So there is not yet an existing file. When reading up on the documentation of hdf5 file image operations, it is assumed that an initial file image is set. I don't have any - as I'm trying to stay as close as possible to the actual user scenario with my tests. Can such a file be created from an empty buffer?

static const unsigned int FileSize = 1024 * 1024 * 100;
std::vector<unsigned char> buffer(FileSize, 0);     // initialize buffer with zeroes
int flags = H5LT_FILE_IMAGE_DONT_COPY | 
            H5LT_FILE_IMAGE_OPEN_RW | 
            H5LT_FILE_IMAGE_DONT_RELEASE;
m_file = H5LTopen_file_image(static_cast<void*>(buffer.data()), buffer.size(), flags);

If want to keep ownership of the buffer as in the example I don't get a valid file id. I suspected a bug in HDF5, but unfortunately leaving the flags H5LT_FILE_IMAGE_DONT_COPY | H5LT_FILE_IMAGE_DONT_RELEASE out didn't work either.


Solution

  • Apparently the H5LTOpen_file_image wraps some calls that also allow for virtual file creation. This is all management by the core file driver. The desired result can be retrieved by passing some parameters to the core file driver.

    auto propertyList = H5Pcreate(H5P_FILE_ACCESS);
    auto h5Result = H5Pset_fapl_core(propertyList, m_buffer.GetSize(), false);
    assert(h5Result >= 0 && "H5Pset_fapl_core failed");
    m_file = H5Fcreate(name, flags, H5P_DEFAULT, propertyList);
    

    The last parameter of the call to H5Pset_fapl_core sets the boolean value for "virtual backing store". If set to false the file contents are not written to disk.

    Note that in the end I had to use all the advanced tricks in the document referred in the opening post to really get all the functionality properly working. The document is a good reference but is slightly outdated (enums have different but similar naming in the latest release).