Search code examples
pythonrpandasnumpydata-analysis

How to calculate incremental spend between two groups in a datafarme for a certain time period


I have a big data frame with the following information, with column day shows the date in dd/mm/yy format. For example, as following,

   UserID     group       day  sp       PU          new
0     213      test   6/10/14   4  $628.14     test-Red
1     314   control   6/10/14  15   $29.98      control
2     354      test  13/10/14   4  $554.58  test-NonRed
3    2131     test1  13/10/14   2   $60.41     test-Red
4     314  control1  13/10/14   1    $8.71      control
5     354     test1  20/10/14   1  $165.63  test-NonRed

I need to calculate the incremental spend in 6 weeks for the total test group (test-Red and test-NonRed) versus the control group. I need to show the result in absolute $ or %. What I have tried,

df.groupby(by=['PU','day']).sum().groupby(level=[0]).cumsum()

This gives me absolute sum for groups for each group, but the excepted answer is something like this, with % also for each group

Control             $# 
Test - NonRed       $# 
Test - Red          $#
Total Result        $#

Any help would be great


Solution

  • You need few steps to get there

    df.PU=df.PU.str[1:].astype(float)
    df.day=pd.to_datetime(df.day)
    new1=pd.pivot_table(df,index='new',columns='day',values='PU',aggfunc=sum,fill_value=0,margins=True)
    new1=new1.drop('All',1)
    new1.cumsum(1)
    Out[263]: 
    day          2014-06-10 00:00:00  2014-10-13 00:00:00  2014-10-20 00:00:00
    new                                                                       
    control                    29.98                38.69                38.69
    test-NonRed                 0.00               554.58               720.21
    test-Red                  628.14               688.55               688.55
    All                       658.12              1281.82              1447.45