My df looks like this.
Policy_No Date
1 10/1/2020
2 20/2/2020
3 20/2/2020
4 23/3/2020
5 18/4/2020
6 30/4/2020
7 30/4/2020
I would like to create a cumulative counter of policies logged in different dates based on the financial year (April-March)
Date Cum count of policies
10/1/2020 1
20/2/2020 3
23/3/2020 4
18/4/2021 1
30/4/2021 3
18th April 2021 being a new financial year, the counter starts from 0. Can someone help solve this?
there's a function called cumsum which does that:
df = pd.DataFrame({"Policy_No":[1,2,3,4,5,6,7],"Date":["10/1/2020","20/2/2020","20/2/2020","23/3/2020","18/4/2020","30/4/2020","30/4/2020"]})
print(df)
#0 1 10/1/2020
#1 2 20/2/2020
#2 3 20/2/2020
#3 4 23/3/2020
#4 5 18/4/2020
#5 6 30/4/2020
#6 7 30/4/2020
df.groupby("Date")["Policy_No"].count().cumsum()
#Date
#10/1/2020 1
#18/4/2020 2
#20/2/2020 4
#23/3/2020 5
#30/4/2020 7
If you want to do it for each financial year, I think you'll need to create a dataframe for each financial year, use the above logic, and concat them at last:
df = ... #dataframe
year_2020 = pd.to_datetime("01/04/2020")<= df["date"] < pd.to_datetime("01/04/2021")
df_2020 = df.loc[year_2020].groupby("date")["Policy_No"].count().cumsum()
year_2021 = pd.to_datetime("01/04/2021")<= df["date"] < pd.to_datetime("01/04/2022")
df_2021 = df.loc[year_2021].groupby("date")["Policy_No"].count().cumsum()
#concat at the end
df_total = pd.concat((df_2020,df_2021))
Of course if you cannot do the year logic (because there's to many), you can place it within a loop like:
def get_financial_dates():
"""
Some function that returns the start and end
of each financial year
"""
return date_start,date_end
df_total = pd.DataFrame() #initial dataframe
for date_start, date_end in get_financial_dates():
idx = date_start <= df["date"] < date_end
df_temp = df.loc[idx].groupby("date")["Policy_No"].count().cumsum()
#concat at the end
df_total = pd.concat((df_total,df_temp))