Search code examples
chdf5

How to use the H5LTget_attribute_string function?


I have asked this on the HDF Forum here but haven't received an answer (yet). So I thought I try my luck here.

I have created a small test file in Python (h5py) and want to use the H5LTget_attribute_string function to read an attribute from it. However, I'm not sure how to use this function.

My test file looks like this.

HDF5 "attr.h5" {
GROUP "/" {
   DATASET "my_dataset" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 12 ) / ( 12 ) }
      DATA {
      (0): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
      }
      ATTRIBUTE "string_attr" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_UTF8;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "this is a string"
         }
      }
   }
}
}

Looking at the documentation of H5LT_GET_ATTRIBUTE it seems to me that I need to allocate a buffer and pass the address of the buffer as the last parameter, after which the H5LT_GET_ATTRIBUTE function would fill the buffer. My first attempt was therefore this.

#include <assert.h>
#include <stdlib.h>
#include "hdf5.h"
#include "hdf5_hl.h"

int main()
{
    herr_t  status;

    hid_t file_id = H5Fopen("attr.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
    assert(file_id >= 0);
    
    char string[1024];  // assume buffer is large enough;
    
    fprintf(stderr, "string : %s\n", string);
    fprintf(stderr, "pointer: %p\n", string);

    fprintf(stderr, "---- reading attribute ----\n");
    status = H5LTget_attribute_string(file_id, "my_dataset", 
                                      "string_attr", string);
    assert(status >= 0);
    
    fprintf(stderr, "string : %s\n", string);
    fprintf(stderr, "pointer: %p\n", string);
    
    status = H5Fclose(file_id);
    assert(status >= 0);
}

However this didn't work as expected, see the output below.

string : 
pointer: 0x7ffe3f7ec770
---- reading attribute ----
string : @B�k2V
pointer: 0x7ffe3f7ec770

After some googling and experimenting I found out that the last parameter should be the address of the buffer. Then the H5LT_GET_ATTRIBUTE function will make the buffer point to the actual attribute value. The following function compiled with a warning but it gave the correct output.

#include <assert.h>
#include <stdlib.h>
#include "hdf5.h"
#include "hdf5_hl.h"

int main()
{
    herr_t  status;

    hid_t file_id = H5Fopen("attr.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
    assert(file_id >= 0);
    
    char* string = NULL;
    
    fprintf(stderr, "string : %s\n", string);
    fprintf(stderr, "pointer: %p\n", string);

    fprintf(stderr, "---- reading attribute ----\n");
    status = H5LTget_attribute_string(file_id, "my_dataset", 
                                      "string_attr", &string);
    assert(status >= 0);
    
    fprintf(stderr, "string : %s\n", string);
    fprintf(stderr, "pointer: %p\n", string);
    
    status = H5Fclose(file_id);
    assert(status >= 0);
}

Output

string : (null)
pointer: (nil)
---- reading attribute ----
string : this is a string
pointer: 0x559e9e3d1240

Now I am perfectly happy to use it like this, and I can cast to **char to get rid of the warning, but I would like to be sure that this is the expected behavior. Ideally the documentation should be updated.

So my questions are:

  1. Is the second example correct?
  2. How long is the data in the string buffer valid? That is, when is the memory released by the HDF lib? (E.g. when the file is closed)
  3. Should I use strcpy to copy the string data before using it?

Solution

  • As pointed by Scot Breitenfeld (from the HDF group):

    If you are reading a variable length string with H5LTget_attribute_string (H5T_VARIABLE) then you don’t need to allocate the string, just pass in a pointer and the library will handle the allocations. If you are reading a fixed length string then you need to allocate a string that is “large enough”.

    So, (1) your second approach seems ok to me.

    As for (2) and (3), I would bet you are responsible for freeing the buffer, so no need to copy it. However, to be sure, you can use a debugger to check if the library is accessing the buffer or, even better, use valgrind to find memory leaks (when you try not to free the buffer).