I have a table that looks like this:
Index Group_Id Period Start Period End Value Value_Count
42 1016833 2012-01-01 2013-01-01 127491.00 17.0
43 1016833 2013-01-01 2014-01-01 48289.00 9.0
44 1016833 2014-01-01 2015-01-01 2048.00 2.0
45 1016926 2012-02-01 2013-02-01 913.00 1.0
46 1016926 2013-02-01 2014-02-01 6084.00 5.0
47 1016926 2014-02-01 2015-02-01 29942.00 3.0
48 1016971 2014-03-01 2015-03-01 0.00 0.0
I am trying to end up with a 'wide' df where each Group_Id has one observation and the value/value counts are converted to columns that correspond to their respective period in order of recency. So the end result would like like:
Index Group_Id Value_P0 Value_P1 Value_P3 Count_P0 Count_P1 ...
42 1016833 2048.00 48289.00 127491.00 2.0 9.0
45 1016926 29942.00 6084.00 913.00 3.0 5.0
48 1016971 0.0 0.00 0.0 0.0 0.0
Where Value_P0 is the most recent value, Value_P1 is the next most recent value after that, and the Count columns work the same way.
I've tried pivoting the table so that the Group_IDs are the indices and Period Start is the columns and Values or Counts is the corresponding value.
Period Start 2006-07-01 2008-07-01 2009-02-01 2009-12-17 2010-02-01 2010-06-01 2010-07-01 2010-08-13 2010-09-01 2010-12-01 ... 2016-10-02 2016-10-20 2016-12-29 2017-01-05 2017-02-01 2017-03-28 2017-04-10 2017-05-14 2017-08-27 2017-09-15
Group_Id
1007310 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1007318 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1007353 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
This way I have the Group_Ids as one record but would then need to loop through each row of the many columns and pull out the non-NaN values. Their order would correspond to oldest to newest. This seems like an incorrect way to go about this though.
I've also considered grouping by Group_Id and somehow creating a timedelta that corresponds to the most recent date. Then from this pivoting/unstacking so that the columns are the timedelta and the values are value or value_count. I'm not sure how to do this though. I appreciate the help.
Still using pivot
df['ID']=df.groupby('Group_Id').cumcount()
d1=df.pivot('Group_Id','ID','Value').add_prefix('Value_P')
d2=df.pivot('Group_Id','ID','Value_Count').add_prefix('Count_P')
pd.concat([d1,d2],axis=1).fillna(0)
Out[347]:
ID Value_P0 Value_P1 Value_P2 Count_P0 Count_P1 Count_P2
Group_Id
1016833 127491.0 48289.0 2048.0 17.0 9.0 2.0
1016926 913.0 6084.0 29942.0 1.0 5.0 3.0
1016971 0.0 0.0 0.0 0.0 0.0 0.0