Given table:
ID | LINE | SITE | DATE | UNITS | TOTAL |
---|---|---|---|---|---|
1 | X | AAA | 02-May-2017 | 12 | 30 |
2 | X | AAA | 03-May-2017 | 10 | 22 |
3 | X | AAA | 04-May-2017 | 22 | 40 |
4 | Z | AAA | 20-MAY-2017 | 15 | 44 |
5 | Z | AAA | 21-May-2017 | 8 | 30 |
6 | Z | BBB | 22-May-2017 | 10 | 32 |
7 | Z | BBB | 23-May-2017 | 25 | 52 |
8 | K | CCC | 02-Jun-2017 | 6 | 22 |
9 | K | CCC | 03-Jun-2017 | 4 | 33 |
10 | K | CCC | 12-Aug-2017 | 11 | 44 |
11 | K | CCC | 13-Aug-2017 | 19 | 40 |
12 | K | CCC | 14-Aug-2017 | 30 | 40 |
for each row if ID,LINE ,SITE equal to previous row (day) need to calculate as below (last day) and (last 3 days ) : Note that is need to insure date are consecutive under "groupby" of ID,LINE ,SITE columns
ID | LINE | SITE | DATE | UNITS | TOTAL | Last day | Last 3 days |
---|---|---|---|---|---|---|---|
1 | X | AAA | 02-May-2017 | 12 | 30 | 0 | 0 |
2 | X | AAA | 03-May-2017 | 10 | 22 | 12/30 | 12/30 |
3 | X | AAA | 04-May-2017 | 22 | 40 | 10/22 | (10+12)/(30+22) |
4 | Z | AAA | 20-MAY-2017 | 15 | 44 | 0 | 0 |
5 | Z | AAA | 21-May-2017 | 8 | 30 | 15/44 | 15/44 |
6 | Z | BBB | 22-May-2017 | 10 | 32 | 0 | 0 |
7 | Z | BBB | 23-May-2017 | 25 | 52 | 10/32 | 10/32 |
8 | K | CCC | 02-Jun-2017 | 6 | 22 | 0 | 0 |
9 | K | CCC | 03-Jun-2017 | 4 | 33 | 6/22 | 6/22 |
10 | K | CCC | 12-Aug-2017 | 11 | 44 | 4/33 | 0 |
11 | K | CCC | 13-Aug-2017 | 19 | 40 | 11/44 | (11/44) |
12 | K | CCC | 14-Aug-2017 | 30 | 40 | 19/40 | (11+19/44+40) |
In this cases i usually do a for loop with groupby:
import pandas as pd
import numpy as np
#copied your table
table = pd.read_csv('/home/fm/Desktop/stackover.csv')
table.set_index('ID', inplace = True)
table[['Last day','Last 3 days']] = np.nan
for i,r in table.groupby(['LINE' ,'SITE']):
#First subset non sequential dates
limits_interval = pd.to_datetime(r['DATE']).diff() != '1 days'
#First element is a false positive, as its impossible to calculate past days from first day
limits_interval.iloc[0]=False
ids_subset = r.index[limits_interval].to_list()
ids_subset.append(r.index[-1]+1) #to consider all values
id_start = 0
for id_end in ids_subset:
r_sub = r.loc[id_start:id_end-1, :].copy()
id_start = id_end
#move all values one day off, if the database is as in your example (1 line per day) wont have problems
r_shifted = r_sub.shift(1)
r_sub['Last day']=r_shifted['UNITS']/r_shifted['TOTAL']
aux_units_cumsum = r_shifted['UNITS'].cumsum()
aux_total_cumsum = r_shifted['TOTAL'].cumsum()
r_sub['Last 3 days'] = aux_units_cumsum/aux_total_cumsum
r_sub.fillna(0, inplace = True)
table.loc[r_sub.index,:]=r_sub.copy()
You can make a function and apply in groupby, it would be cleaner: Apply function to pandas groupby. It would be more elegant. Wish I could help you, good luck