Search code examples
pythonhdf5

Error storing a hdf5 file with a list of strings


def storeFlagsFile(FLAGS_F, file_name, t0, text, ID):
    if not FLAGS_F:  # this flag doesnt work for mulitple users
        f = h5py.File(file_name, "r+")
        data_content = np.array([np.round(time.time() - t0, 3), text])
        asciiList = np.array([str(n).encode("utf-8", "ignore") for n in data_content]).reshape(1, 2)
        dt = h5py.string_dtype(encoding='utf-8')
        dset = f[str(ID)].create_dataset('AcqFlags', data=asciiList, compression="gzip", chunks=True, maxshape=(None, 2), dtype=dt)
        FLAGS_F = 1
    else:
        f = h5py.File(file_name, "r+")        
        data_content = np.array([np.round(time.time() - t0, 3), text]) 
        asciiList = np.array([str(n).encode("utf-8", "ignore") for n in data_content]).reshape(1, 2)
        f[str(ID)+'/AcqFlags'].resize((f[str(ID)+'/AcqFlags'].shape[0] + 1), axis = 0)
        f[str(ID)+'/AcqFlags'][-1:] = asciiList

I want to save a data format like this in the format (None, 2) since I am continually updating the data row per row by calling the storeFlagsFile function.

['4.412' 'a']
['5.412' 'b']
['6.412' 'c']
['8.226' 'd']

in which t0 the first column and text = the second column of the data, which I give as input row per row to storeFlagsFile(FLAGS_F, file_name, t0, text, ID). FLAGS_F is initially 0 and ID = "122".

but I am observing the hdf5 file like this: enter image description here

Can anyone point me out what I am doing wrong please? Thank you!


Solution

  • It's not clear (to me) why you aren't getting 2 fields in your AcqFlags dataset. I was able to get your code segment to work with a small modification. (I am using h5py 2.9.0. A new dtype was added to h5py 2.10.0 for variable length strings. That change is noted with comments in the dt= declarations. This is not an error in your code.) See below.

    import h5py, numpy as np
    import time
    
    def storeFlagsFile(FLAGS_F, file_name, t0, text, ID):
        if not FLAGS_F:  # this flag doesnt work for mulitple users
            with h5py.File(file_name, "r+") as f:
                data_content = np.array([np.round(time.time() - t0, 3), text])
                asciiList = np.array([str(n).encode("utf-8", "ignore") for n in data_content]).reshape(1, 2)
                #dt = h5py.string_dtype(encoding='utf-8') # for h5py 2.10.0
                dt = h5py.special_dtype(vlen=str)   # for h5py 2.9.0
                dset = f[str(ID)].create_dataset('AcqFlags', data=asciiList, compression="gzip", chunks=True, maxshape=(None, 2), dtype=dt)
                FLAGS_F = 1
        else:
            with h5py.File(file_name, "r+") as f:      
                data_content = np.array([np.round(time.time() - t0, 3), text]) 
                asciiList = np.array([str(n).encode("utf-8", "ignore") for n in data_content]).reshape(1, 2)
                f[str(ID)+'/AcqFlags'].resize((f[str(ID)+'/AcqFlags'].shape[0] + 1), axis = 0)
                f[str(ID)+'/AcqFlags'][-1:] = asciiList
    
    file_name = 'SO_62064344.h5'
    ID = 122
    with h5py.File(file_name, 'w') as f:
        f.create_group(str(ID))
    
    storeFlagsFile(False, file_name, 4.412, 'a', ID)       
    storeFlagsFile(True, file_name, 5.412, 'b', ID)       
    storeFlagsFile(True, file_name, 6.412, 'c', ID)       
    storeFlagsFile(True, file_name, 8.226, 'd', ID)     
    storeFlagsFile(True, file_name, 9.773, 'e', ID)  
    

    Additional thoughts/observations:

    1. I noticed you are storing the time value as a string. Is that what you want? HDF5 and h5py can store different datatypes in each field/column, so you can mix floats and and strings if you want. It requires a different dtype (like a record array).
    2. You use FLAGS_F as a flag to create the AcqFlags dataset. You can simplify this to test for existence, or use require_dataset.
    3. You are adding 1 row at a time to a resizable dataset. This is OK for "smallish" datasets, but could be a performance problem if you create 10e6 rows 1 row at a time.
    4. If you are interested, I answered other questions that shows how to do #2 and #3 above. You might find one of these answers helpful: