python pandas datetime pandas-groupby sumifs

Populating empty data frame from conditional checks on a table

I’m looking for some bearings on how to approach this problem.

I have a table ‘AnnualContracts’

Of which, there is the following types of data:

Year1Amount	Year1Payday	Year2Amount	Year2Payday	Year3Amount	Year3Payday
1000.0	2020-08-01	1000.0	2021-08-01	1000.0	2022-08-01
2400.0	2021-06-01	3400.0	2022-06-01	4400.0	2023-06-01
1259.0	2019-05-01	1259.0	2020-05-01	1259.0	2021-05-01
2150.0	2021-08-01	2150.0	2022-08-01	2150.0	2023-08-01

etc, this ranges up to 5 years, and 380+ rows, there are four types of customers (with their own respective tables set up similar as above): Annual paying, Bi-Annual paying, Quarterly Paying and Monthly paying.

I also have an empty dataframe (SumsOfPayments) with indexes based on variables which update each month and columns based on the above mentioned customer types.

looks like this:

	Annual	Bi-Annual	Quarterly	Monthly
12monthsago
11monthsago
10monthsago

etc until until it hits 60 months into the future.

the indexes on the SumOfPayments and the YearXPaydays are all set to the 1st of their respective month, so they can == match.

(as an example of how the index variables are set on the SumOfPayments table):

12monthsago = datetime.today().replace(day=1,hour=0,minute=0).replace(second=0,microsecond=0)+relativedelta(months=-12)

so if todays date is 13/08/2021, the above would produce 2020-08-01 00:00:00.

What the intention behind this is to:

order the YearXPaydays by date, have a total of the sum of the YearXAmounts by that grouped date
from those grouped sums, check against the index on the SumOfPayments dataframe, and enter the sum wherever the dates match

example (based on the tables above)

AnnualContracts:

Year1Amount	Year1Payday	Year2Amount	Year2Payday	Year3Amount	Year3Payday
1000.0	2020-08-01	1000.0	2021-08-01	1000.0	2022-08-01
2400.0	2021-06-01	3400.0	2022-06-01	4400.0	2023-06-01
1259.0	2019-05-01	1259.0	2020-05-01	1259.0	2021-05-01
2150.0	2021-08-01	2150.0	2022-08-01	2150.0	2023-08-01

SumOfPayments:

	Annual	Bi-Annual	Quarterly	Monthly
12monthsago	1000.0
11monthsago
10monthsago
9monthsago
8monthsago
7monthsago
6monthsago
5monthsago
4monthsago
3monthsago	1259.0
2monthsago	2400.0
1monthsago
currentmont	3150.0

Any help on this would be massively appreciated, thanks in advance for any assistance.

Solution

You could use wide_to_long if your column names were a little different. Instead I'll just split and melt them to get the data in the right shape. If you're curious what's happening, just print out dt and amt to see what they look like after melting.

Then you can create your output table using 13 periods (this month plus the past 12 months) and start it from the beginning of the month on year ago.

You can create multiple tables for each level of aggregation you want, annual, bi-annual, etc. Then just merge them to the table with the date range.

import pandas as pd
from datetime import date, timedelta, date

df = pd.DataFrame({'Year1Amount': {0: 1000.0, 1: 2400.0, 2: 1259.0, 3: 2150.0},
 'Year1Payday': {0: '2020-08-01',
  1: '2021-06-01',
  2: '2019-05-01',
  3: '2021-08-01'},
 'Year2Amount': {0: 1000.0, 1: 3400.0, 2: 1259.0, 3: 2150.0},
 'Year2Payday': {0: '2021-08-01',
  1: '2022-06-01',
  2: '2020-05-01',
  3: '2022-08-01'},
 'Year3Amount': {0: 1000.0, 1: 4400.0, 2: 1259.0, 3: 2150.0},
 'Year3Payday': {0: '2022-08-01',
  1: '2023-06-01',
  2: '2021-05-01',
  3: '2023-08-01'}})

hist = pd.DataFrame({'Date':pd.date_range(start=(date.today() - timedelta(days=365)).replace(day=1),
                                          freq=pd.offsets.MonthBegin(),
                                          periods=13)})



# Split and melt
dt = df[[x for x in df.columns if 'Payday' in x]].melt(value_name='Date')
amt = df[[x for x in df.columns if 'Amount' in x]].melt(value_name='Annual')

# Combine and make datetime
df = pd.concat([amt['Annual'], dt['Date']],axis=1)
df['Date'] = pd.to_datetime(df['Date'])


# Do all of your aggregations into new dataframes like such, you'll need one for each column
# here's how to do the annual one
annual_sum = df.groupby('Date', as_index=False).sum()

# For each aggregation, merge to the hist df
hist = hist.merge(annual_sum, on='Date', how='left')

Output

         Date  Annual
0  2020-08-01  1000.0
1  2020-09-01     NaN
2  2020-10-01     NaN
3  2020-11-01     NaN
4  2020-12-01     NaN
5  2021-01-01     NaN
6  2021-02-01     NaN
7  2021-03-01     NaN
8  2021-04-01     NaN
9  2021-05-01  1259.0
10 2021-06-01  2400.0
11 2021-07-01     NaN
12 2021-08-01  3150.0