I have this function that constantly adds a new element in a dataset array of an HDF5 file every second.
from time import time, sleep
i = 100
def update_array():
hf = h5py.File('task1.h5', 'r+')
old_rec = np.array(hf.get('array'))
global i
i = i+1
new_rec = np.append(old_rec, i)
#deleting old record andreplacing with updated record
del hf['array']
new_data = hf.create_dataset('array', data = new_rec)
print(new_rec)
hf.close()
while True:
sleep(1 - time() % 1)
update_array()
The output of the print line (basically showing the updated array..... we do not know if it is getting saved in the file or not):
[101.]
[101. 102.]
[101. 102. 103.]
[101. 102. 103. 104.]
[101. 102. 103. 104. 105.]
[101. 102. 103. 104. 105. 106.]
[101. 102. 103. 104. 105. 106. 107.]
[101. 102. 103. 104. 105. 106. 107. 108.]
I want to have a separate notebook that can track changes made by the above function and display the updated contents of this dataset present in the HDF5 file system.
I want a separate function for this task because I want to make sure that the updated content gets saved in the HDF5 files, and perform further on fly operations on them as they keep arriving.
Here is a potential solution attaching attributes to the 'array'
dataset. Adding attributes to a HDF5 data object are easy with .attrs
. It has a dictionary-like syntax: h5obj[attr_name] = attr_value
. Attribute value types can be ints, strings, floats, and arrays. You can add 2 attributes to your dataset with the following 2 lines:
hf['array'].attrs['Last Value'] = i
hf['array'].attrs['Time Added'] = ctime(time())
To demonstrate, I added these lines to your code, along with several other modifications to address the following issues:
create_array()
to initially create the file and dataset. I created it as a resizable dataset to simplify logic in update_array()
.update_array()
code to enlarge the dataset and append the new value. This is much cleaner (and faster) than your 4 step process.with / as:
context manager to open the file. This eliminates the need to close it, and (more importantly) ensures it is closed cleanly if the program exits abnormally.hf['array'][:]
instead of np.array(hf.get('array'))
.with / as:
lines into the main and pass the resulting hf
object to create_array()
and update_array()
functions. If you do that, you can easily consolidate the 2 functions. (You will need logic to test if the 'array'
dataset exists.)Code below:
import h5py
from time import time, sleep, ctime
def create_array():
with h5py.File('task1.h5', 'w') as hf:
global i
#create dataset and add new record
new_data = hf.create_dataset('array', shape=(1,), maxshape=(None,),
data = [i])
# add attributes
hf['array'].attrs['Last Value'] = i
hf['array'].attrs['Time Added'] = ctime(time())
print(hf['array'][:])
def update_array():
with h5py.File('task1.h5', 'r+') as hf:
global i
i += 1
#resize dataset and add new record
a0 = hf['array'].shape[0]
hf['array'].resize(a0+1,axis=0)
hf['array'][a0] = i
# add attributes
hf['array'].attrs['Last Value'] = i
hf['array'].attrs['Time Added'] = ctime(time())
print(hf['array'][:])
i = 100
create_array()
while i < 110:
sleep(1 - time() % 1)
update_array()
print('Done')