Search code examples
pythonpandastransposepanel-data

How to transpose panel data into correct form using Pandas


I seem to be struggling with this conceptually so can't think how I will code it.

My data frame currently looks like this where I have split the GMT column into day, month and year columns and ID# is the ID number for each home with values underneath being the KwH usage at each time:

0    GMT         ID1    ID2   ID3   ...   ID1000   Day   Month   Year
1    01/01/2012  0.2    0.15  0.1   ...   0.1      1     1       2012
2    02/01/2012  0.3    0.2   0.2   ...   0.4      2     1       2012
3    03/01/2012  0.1    0.4   0.1   ...   0.5      3     1       2012
.    .           .      .     .     ...   .        .     .       . 
.    .           .      .     .     ...   .        .     .       . 
.    .           .      .     .     ...   .        .     .       .
1000 31/12/2012  0.3    0.2   0.05  ...   0.4      31    12      2012

However I want to transform it in such a way that it represents a panel data set such that it looks like:

0     ID   GMT         Day   Month   Year    KwH
1     1    01/01/2012  1     1       2012    0.2
2     1    02/01/2012  2     1       2012    0.3
3     1    03/01/2012  3     1       2012    0.1
.     .     .          .     .       .       .        
.     .     .          .     .       .       .        
.     .     .          .     .       .       .       
10000 1000 31/12/2012  31    12      2012    0.3

So essentially I want to transpose the dataframe in a way that transposes the ID columns so they become rows and the data associated with each ID becoming grouped by GMT ensuring the data is still matched correctly to the date columns.

I am struggling so far as the df.transpose() function appears to be too blunt an instrument to tarnspose with the nuance I am looking for.

Any help would be greatly appreciated!


Solution

  • I have taken a sample dataframe with GMT as index. Also, you can get later the day, month and year

                ID1  ID2    ID3  ID1000
    GMT             
    01/01/2012  0.2  0.15   0.1   0.1
    02/01/2012  0.3  0.20   0.2   0.4
    03/01/2012  0.1  0.40   0.1   0.5
    

    You can then use unstack to combine GMT and ID as the index and KwH as the values

    output_df = df.unstack().to_frame()
    output_df
    

    enter image description here

    Finally, you can make the required changes to get the df in the right format

    output_df = output_df.reset_index().rename(columns={'level_0': 'ID', 0: 'KwH'})
    output_df['ID'] = output_df['ID'].str.replace('ID', '') # remove ID from the ID column values
    output_df['GMT'] = pd.to_datetime(output_df['GMT'])
    output_df['Day'] = output_df['GMT'].dt.day
    output_df['Month'] = output_df['GMT'].dt.month
    output_df['Year'] = output_df['GMT'].dt.year
    output_df
    

    enter image description here