Search code examples
pythondataframehdfstore

HDFStore output in dataframe not series


I would like to have the two tables that I read in, stored in data frames.

I'm reading a h5 file into my code with:

with pd.HDFStore(directory_path) as store:
    self.df = store['/raw_ti4404']
    self.hr_df = store['/metric_heartrate']

self.df is being stored as a data frame, but self.hr_df is being stored as a series.

I am calling them both in the same manner and I don't understand why the one is a data frame and the other a series. It might be something to do with how the data is stored:

enter image description here

Any help on how to store the metric_heartrate as a data frame would be appreciated.


Solution

  • Most probably the metric_heartrate was stored as Series.

    Demo:

    Generate sample DF:

    In [123]: df = pd.DataFrame(np.random.rand(10, 3), columns=list('abc'))
    
    In [124]: df
    Out[124]:
              a         b         c
    0  0.404338  0.010642  0.686192
    1  0.108319  0.962482  0.772487
    2  0.564785  0.456916  0.496818
    3  0.122507  0.653329  0.647296
    4  0.348033  0.925427  0.937080
    5  0.750008  0.301208  0.779692
    6  0.833262  0.448925  0.553434
    7  0.055830  0.267205  0.851582
    8  0.189788  0.087814  0.902296
    9  0.045610  0.738983  0.831780
    
    In [125]: store = pd.HDFStore('d:/temp/test.h5')
    

    Let's store a column as Series:

    In [126]: store.append('ser', df['a'], format='t')
    

    Let's store a DataFrame, containing only one column - a:

    In [127]: store.append('df', df[['a']], format='t')
    

    Reading data from HDFStore:

    In [128]: store.select('ser')
    Out[128]:
    0    0.404338
    1    0.108319
    2    0.564785
    3    0.122507
    4    0.348033
    5    0.750008
    6    0.833262
    7    0.055830
    8    0.189788
    9    0.045610
    Name: a, dtype: float64
    
    In [129]: store.select('df')
    Out[129]:
              a
    0  0.404338
    1  0.108319
    2  0.564785
    3  0.122507
    4  0.348033
    5  0.750008
    6  0.833262
    7  0.055830
    8  0.189788
    9  0.045610
    

    Fix - read Series and convert it to DF:

    In [130]: store.select('ser').to_frame('a')
    Out[130]:
              a
    0  0.404338
    1  0.108319
    2  0.564785
    3  0.122507
    4  0.348033
    5  0.750008
    6  0.833262
    7  0.055830
    8  0.189788
    9  0.045610