I've written two scripts.
The first script is designed to run one time only, and creates a large, empty dataframe named df_empty, which is saved into an HDF5 file, storage.h5 using the following code within the script:
from pandas import HDFStore
hdf = HDFStore('storage.h5')
hdf.put('d1', df_empty, format='table', data_columns=True)
This works perfectly.
My next script is designed to run every 30 mins and take data from half-hourly-generated CSV files and:
Put this data into a new dataframe df;
Import the dataframe from storage.h5 as df2;
Merge df and df2 using the index union command into df3;
Save the new dataframe back into storage.h5, effectively overwriting the previous file.
The relevant section of the code is as follows:
from pandas import HDFStore
store = pd.HDFStore('storage.h5')
df2 = pd.DataFrame(store['d1'])
df3 = df.reindex(index = df2.index.union(df.index))
hdf.put('d1', df3, format='table', data_columns=True)
This works well if I run the two scripts sequentially in Jupyter Notebook (I have installed the latest version of Anaconda and am running this on a Windows 7 machine).
However, when I run from the command prompt, I encounter problems. The first script runs with no errors. However, the second script throws up the following error:
Traceback (most recent call last): File "myfile.py", line 64, in hdf.put('d1', df3, format='table', data_columns=True) NameError: name 'hdf' is not defined Closing remaining open files:storage.h5...donestorage.h5...done
Does anyone have any suggestions as to what I might be doing wrong?
I can't comment because i've got not enough reputation,
but could it be possible that you opened the hd5 store and assigned it to the variable
store
while you try putting in the new data using the variable
hdf ?