Search code examples
pythongroup-byrolling-sum

How to do rolling sum with conditional window criteria on different index levels in Python


I want to do a rolling sum based on different levels of the index but am struggling to make it a reality. Instead of explaining the problem am giving below the demo input and desired output along with the kind of insights am looking for.

So I have multiple brands and each of their sales of various item categories in different year month day grouped by as below. What I want is a dynamic rolling sum at each day level, rolled over a window on Year as asked.

for eg, if someone asks

Demo question 1) Till a certain day(not including that day) what were their last 2 years' sales of that particular category for that particular brand.

I need to be able to answer this for every single day i.e every single row should have a number as shown in Table 2.0.

I want to be able to code in such a way that if the question changes from 2 years to 3 years I just need to change a number. I also need to do the same thing at the month's level.

demo question 2) Till a certain day(not including that day) what was their last 3 months' sale of that particular category for that particular year for that particular brand.

Below is demo input

The tables are grouped by brand,category,year,month,day and sum of sales from a master table which had all the info and sales at hour level each day

Table 1.0

Brand Category Year Month Day Sales
ABC Big Appliances 2021 9 3 0
Clothing 2021 9 2 0
Electronics 2020 10 18 2
Utensils 2020 10 18 0
2021 9 2 4
3 0
XYZ Big Appliances 2012 4 29 7
2013 4 7 6
Clothing 2012 4 29 3
Electronics 2013 4 9 1
27 2
5 4 5
2015 4 27 7
5 2 2
Fans 2013 4 14 4
5 4 0
2015 4 18 1
5 17 11
2016 4 12 18
Furniture 2012 5 4 1
8 6
20 4
2013 4 5 1
7 8
9 2
2015 4 18 12
27 15
5 2 4
17 3
Musical-inst 2012 5 18 10
2013 4 5 6
2015 4 16 10
18 0
2016 4 12 1
16 13
Utencils 2012 5 8 2
2016 4 16 3
18 2
2017 4 12 13

Below is desired output for demo question 1 based on the demo table(last 2 years cumsum not including that day)

Table 2.0

Brand Category Year Month Day Sales Conditional Cumsum(till last 2 years)
ABC Big Appliances 2021 9 3 0 0
Clothing 2021 9 2 0 0
Electronics 2020 10 18 2 0
Utensils 2020 10 18 0 0
2021 9 2 4 0
3 0 4
XYZ Big Appliances 2012 4 29 7 0
2013 4 7 6 7
Clothing 2012 4 29 3 0
Electronics 2013 4 9 1 0
27 2 1
5 4 5 3
2015 4 27 7 8
5 2 2 15
Fans 2013 4 14 4 0
5 4 0 4
2015 4 18 1 4
5 17 11 5
2016 4 12 18 12
Furniture 2012 5 4 1 0
8 6 1
20 4 7
2013 4 5 1 11
7 8 12
9 2 20
2015 4 18 12 11
27 15 23
5 2 4 38
17 3 42
Musical-inst 2012 5 18 10 0
2013 4 5 6 10
2015 4 16 10 6
18 0 16
2016 4 12 1 10
16 13 11
Utencils 2012 5 8 2 0
2016 4 16 3 0
18 2 3
2017 4 12 13 5

End thoughts:

The idea is to basically do a rolling window over year column maintaining the 2 years span criteria and keep on summing the sales figures.

P.S I really need a fast solution due to the huge data size and therefore created a .apply function row-wise which I didn't find feasible. A better solution by using some kind of group rolling sum or supporting columns will be really helpful.


Solution

  • Here I'm giving a sample solution for the above problem.
    I have concidered just onr product so that the solution would be simple

    Code:

    from datetime import date,timedelta
    Input={"Utencils": [[2012,5,8,2],[2016,4,16,3],[2017,4,12,13]]}
    Input1=Input["Utencils"]
    Limit=timedelta(365*2)
    cumsum=0
    lis=[]
    Tot=[]
    for i in range(len(Input1)):
        if(lis):
            while(lis):
                idx=lis[0]
                Y,M,D=Input1[i][:3]
                reqDate=date(Y,M,D)-Limit
                Y,M,D=Input1[idx][:3]
                if(date(Y,M,D)<=reqDate):
                    lis.pop(0)
                    cumsum-=Input1[idx][3]
                else:
                    break
        Tot.append(cumsum)
        lis.append(i)
        cumsum+=Input1[i][3]
    print(Tot)
    

    Here Tot would output the required cumsum column for the given data. Output:

    [0, 0, 3]
    

    Here you can specify the Time span using Number of days in Limit variable. Hope this solves the problem you are looking for.