Search code examples
rhdf5rhdf5

Opening HDF5 file without modifying file timestamp


I am currently writing a function in R to convert the output of an external program (in HDF5) on a Linux machine to a different file format. I would need to retain the timestamps due to the way my pipeline is structured (mainly for reproducibility purposes).

My function currently just wraps rhdf5::H5Fopen() (with extra data transformation)

function(path_to_file){
  data <- rhdf5::H5Fopen(path_to_file,
    # preserve original file structure
    native = TRUE
  )

  data <- as.data.frame(data[["slot1"]])

  return(data)
}

However, this causes the timestamp (when the file was last modified) to be modified every time I read the file through the function. Is there any way to retain the original timestamp when opening the file? Thanks


Solution

  • If you open the file in read only mode, then the timestamp won't be modified e.g.

    library(rhdf5)
    
    h5file <- '/tmp/h5ex_t_array.h5'
    file.mtime( h5file )
    #> [1] "2022-06-27 15:23:43 CEST"
    
    fid <- rhdf5:::H5Fopen( h5file, flags = 'H5F_ACC_RDONLY' )
    H5Fclose(fid)
    file.mtime( h5file )
    #> [1] "2022-06-27 15:23:43 CEST"
    
    fid <- rhdf5:::H5Fopen( h5file )
    H5Fclose(fid)
    file.mtime( h5file )
    #> [1] "2024-02-08 12:24:53 CET"
    

    Remember that you should always pair an open operation with a close in HDF5, otherwise you'll end up with potential file lock issues and memory leaks. In this case that would be H5Fclose().

    It might be easier to use h5read(), which uses read only by default and handles the closing of files automatically.