Search code examples
pythonpandaspandas-groupbycounterdata-manipulation

How to create a cumulative counter for sales in a financial year?


My df looks like this.

Policy_No   Date
1           10/1/2020
2           20/2/2020
3           20/2/2020
4           23/3/2020
5           18/4/2020
6           30/4/2020
7           30/4/2020

I would like to create a cumulative counter of policies logged in different dates based on the financial year (April-March)

Date         Cum count of policies
10/1/2020    1
20/2/2020    3
23/3/2020    4
18/4/2021    1
30/4/2021    3

18th April 2021 being a new financial year, the counter starts from 0. Can someone help solve this?


Solution

  • there's a function called cumsum which does that:

    df = pd.DataFrame({"Policy_No":[1,2,3,4,5,6,7],"Date":["10/1/2020","20/2/2020","20/2/2020","23/3/2020","18/4/2020","30/4/2020","30/4/2020"]})
    
    print(df)
    #0  1   10/1/2020
    #1  2   20/2/2020
    #2  3   20/2/2020
    #3  4   23/3/2020
    #4  5   18/4/2020
    #5  6   30/4/2020
    #6  7   30/4/2020
    
    df.groupby("Date")["Policy_No"].count().cumsum()
    
    #Date
    #10/1/2020    1
    #18/4/2020    2
    #20/2/2020    4
    #23/3/2020    5
    #30/4/2020    7
    

    If you want to do it for each financial year, I think you'll need to create a dataframe for each financial year, use the above logic, and concat them at last:

    df = ... #dataframe
    year_2020 = pd.to_datetime("01/04/2020")<= df["date"] < pd.to_datetime("01/04/2021")
    df_2020 = df.loc[year_2020].groupby("date")["Policy_No"].count().cumsum()
    
    year_2021 = pd.to_datetime("01/04/2021")<= df["date"] < pd.to_datetime("01/04/2022")
    df_2021 = df.loc[year_2021].groupby("date")["Policy_No"].count().cumsum()
    
    #concat at the end
    
    df_total = pd.concat((df_2020,df_2021))
    

    Of course if you cannot do the year logic (because there's to many), you can place it within a loop like:

    def get_financial_dates():
         """
          Some function that returns the start and end
          of each financial year
         """
         return date_start,date_end
    
    
    df_total = pd.DataFrame() #initial dataframe
    
    for date_start, date_end in get_financial_dates():
        idx = date_start <= df["date"] < date_end
        df_temp = df.loc[idx].groupby("date")["Policy_No"].count().cumsum()
    
        #concat at the end  
        df_total = pd.concat((df_total,df_temp))