Search code examples
pandasrows

Pandas df unstack rows


I have a dataframe with the following structure:

id       timestamp_str  y   x
2569     Jan.2018       4   9   
2569     Feb.2018       6   5
2569     Mar.2018       3   9   
2569     Apr.2018       2   10
2569     May.2018       6   9
3679     Jan.2018       14  52  
3679     Feb.2018       62  4   
3679     Mar.2018       5   89
3679     Apr.2018       9   4
3679     May.2018       32  9
.
.
.

and I would like to have this data into the following format:

id    parameter   Jan.2018   Feb.2018   Mar.2018   Apr.2018   May.2018
2569  y            4         6          3          2          6
2569  x            9         5          9          10         9
3679  y            14        62         5          9          32
3679  x            52        4          89         4          9

But so far I was not able to change the format, I tried something like this

df = (df.set_index(['id',df.groupby(['id']).cumcount().add(1)])
    .unstack()
    .sort_index(axis=1, level=[1, 0], ascending=[True, False]))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
df = df.reset_index()

Thanks for any hints,

cheers!


Solution

  • If need solutions without convert to datetimes:

    Use DataFrame.stack with Series.unstack for reshape, set original ordering of datetimes in DataFrame.reindex:

    df = (df.set_index(['id_x','timestamp_str'])
           .stack()
           .unstack(1)
           .reindex(df['timestamp_str'].unique(),axis=1)
           .rename_axis(index=['id','parameter'], columns=None)
           .reset_index())
    
    print (df)
         id parameter  Jan.2018  Feb.2018  Mar.2018  Apr.2018  May.2018
    0  2569         y         4         6         3         2         6
    1  2569         x         9         5         9        10         9
    2  3679         y        14        62         5         9        32
    3  3679         x        52         4        89         4         9
    

    Or like mentioned @sammywemmy DataFrame.melt with DataFrame.pivot:

    df = (df.melt(['id_x', 'timestamp_str'])
           .pivot(['id_x', 'variable'], 'timestamp_str', 'value')
           .reindex(df['timestamp_str'].unique(),axis=1)
           .rename_axis(index=['id','parameter'], columns=None)
           .reset_index())
    
    print (df)
         id parameter  Jan.2018  Feb.2018  Mar.2018  Apr.2018  May.2018
    0  2569         x         9         5         9        10         9
    1  2569         y         4         6         3         2         6
    2  3679         x        52         4        89         4         9
    3  3679         y        14        62         5         9        32
    

    Solutions with convert to datetimes - first convert column timestamp_str to datetimes, apply solution without reindex and add rename for original format of datetimes:

    df['timestamp_str'] = pd.to_datetime(df['timestamp_str'], format='%b.%Y')
    
    df = (df.set_index(['id_x','timestamp_str'])
           .stack()
           .unstack(1)
           .rename_axis(index=['id','parameter'], columns=None)
           .rename(columns= lambda x: x.strftime('%b.%Y'))
           .reset_index())
    

    df['timestamp_str'] = pd.to_datetime(df['timestamp_str'], format='%b.%Y')
    
    df = (df.melt(['id_x', 'timestamp_str'])
           .pivot(['id_x', 'variable'], 'timestamp_str', 'value')
           .rename_axis(index=['id','parameter'], columns=None)
           .rename(columns= lambda x: x.strftime('%b.%Y'))
           .reset_index())