I seem to be struggling with this conceptually so can't think how I will code it.
My data frame currently looks like this where I have split the GMT column into day, month and year columns and ID# is the ID number for each home with values underneath being the KwH usage at each time:
0 GMT ID1 ID2 ID3 ... ID1000 Day Month Year
1 01/01/2012 0.2 0.15 0.1 ... 0.1 1 1 2012
2 02/01/2012 0.3 0.2 0.2 ... 0.4 2 1 2012
3 03/01/2012 0.1 0.4 0.1 ... 0.5 3 1 2012
. . . . . ... . . . .
. . . . . ... . . . .
. . . . . ... . . . .
1000 31/12/2012 0.3 0.2 0.05 ... 0.4 31 12 2012
However I want to transform it in such a way that it represents a panel data set such that it looks like:
0 ID GMT Day Month Year KwH
1 1 01/01/2012 1 1 2012 0.2
2 1 02/01/2012 2 1 2012 0.3
3 1 03/01/2012 3 1 2012 0.1
. . . . . . .
. . . . . . .
. . . . . . .
10000 1000 31/12/2012 31 12 2012 0.3
So essentially I want to transpose the dataframe in a way that transposes the ID columns so they become rows and the data associated with each ID becoming grouped by GMT ensuring the data is still matched correctly to the date columns.
I am struggling so far as the df.transpose()
function appears to be too blunt an instrument to tarnspose with the nuance I am looking for.
Any help would be greatly appreciated!
I have taken a sample dataframe with GMT as index. Also, you can get later the day, month and year
ID1 ID2 ID3 ID1000
GMT
01/01/2012 0.2 0.15 0.1 0.1
02/01/2012 0.3 0.20 0.2 0.4
03/01/2012 0.1 0.40 0.1 0.5
You can then use unstack to combine GMT
and ID
as the index and KwH
as the values
output_df = df.unstack().to_frame()
output_df
Finally, you can make the required changes to get the df in the right format
output_df = output_df.reset_index().rename(columns={'level_0': 'ID', 0: 'KwH'})
output_df['ID'] = output_df['ID'].str.replace('ID', '') # remove ID from the ID column values
output_df['GMT'] = pd.to_datetime(output_df['GMT'])
output_df['Day'] = output_df['GMT'].dt.day
output_df['Month'] = output_df['GMT'].dt.month
output_df['Year'] = output_df['GMT'].dt.year
output_df