How to sum values under GroupBy and consecutive date conditions?

Given table:

ID	LINE	SITE	DATE	UNITS	TOTAL
1	X	AAA	02-May-2017	12	30
2	X	AAA	03-May-2017	10	22
3	X	AAA	04-May-2017	22	40
4	Z	AAA	20-MAY-2017	15	44
5	Z	AAA	21-May-2017	8	30
6	Z	BBB	22-May-2017	10	32
7	Z	BBB	23-May-2017	25	52
8	K	CCC	02-Jun-2017	6	22
9	K	CCC	03-Jun-2017	4	33
10	K	CCC	12-Aug-2017	11	44
11	K	CCC	13-Aug-2017	19	40
12	K	CCC	14-Aug-2017	30	40

for each row if ID,LINE ,SITE equal to previous row (day) need to calculate as below (last day) and (last 3 days ) : Note that is need to insure date are consecutive under "groupby" of ID,LINE ,SITE columns

ID	LINE	SITE	DATE	UNITS	TOTAL	Last day	Last 3 days
1	X	AAA	02-May-2017	12	30	0	0
2	X	AAA	03-May-2017	10	22	12/30	12/30
3	X	AAA	04-May-2017	22	40	10/22	(10+12)/(30+22)
4	Z	AAA	20-MAY-2017	15	44	0	0
5	Z	AAA	21-May-2017	8	30	15/44	15/44
6	Z	BBB	22-May-2017	10	32	0	0
7	Z	BBB	23-May-2017	25	52	10/32	10/32
8	K	CCC	02-Jun-2017	6	22	0	0
9	K	CCC	03-Jun-2017	4	33	6/22	6/22
10	K	CCC	12-Aug-2017	11	44	4/33	0
11	K	CCC	13-Aug-2017	19	40	11/44	(11/44)
12	K	CCC	14-Aug-2017	30	40	19/40	(11+19/44+40)

Solution

In this cases i usually do a for loop with groupby:

import pandas as pd
import numpy as np

#copied your table
table = pd.read_csv('/home/fm/Desktop/stackover.csv')
table.set_index('ID', inplace = True)
table[['Last day','Last 3 days']] = np.nan

for i,r in table.groupby(['LINE' ,'SITE']):
    #First subset non sequential dates
    limits_interval = pd.to_datetime(r['DATE']).diff() != '1 days'
    #First element is a false positive, as its impossible to calculate past days from first day
    limits_interval.iloc[0]=False

    ids_subset = r.index[limits_interval].to_list()
    ids_subset.append(r.index[-1]+1) #to consider all values
    id_start = 0

    for id_end in ids_subset:    
        r_sub = r.loc[id_start:id_end-1, :].copy()
        id_start = id_end 

        #move all values one day off, if the database is as in your example (1 line per day) wont have problems
        r_shifted = r_sub.shift(1)

        r_sub['Last day']=r_shifted['UNITS']/r_shifted['TOTAL']

        aux_units_cumsum = r_shifted['UNITS'].cumsum()
        aux_total_cumsum = r_shifted['TOTAL'].cumsum()

        r_sub['Last 3 days'] = aux_units_cumsum/aux_total_cumsum

        r_sub.fillna(0, inplace = True)

        table.loc[r_sub.index,:]=r_sub.copy()

You can make a function and apply in groupby, it would be cleaner: Apply function to pandas groupby. It would be more elegant. Wish I could help you, good luck