I’m looking for some bearings on how to approach this problem.
I have a table ‘AnnualContracts’
Of which, there is the following types of data:
Year1Amount | Year1Payday | Year2Amount | Year2Payday | Year3Amount | Year3Payday |
---|---|---|---|---|---|
1000.0 | 2020-08-01 | 1000.0 | 2021-08-01 | 1000.0 | 2022-08-01 |
2400.0 | 2021-06-01 | 3400.0 | 2022-06-01 | 4400.0 | 2023-06-01 |
1259.0 | 2019-05-01 | 1259.0 | 2020-05-01 | 1259.0 | 2021-05-01 |
2150.0 | 2021-08-01 | 2150.0 | 2022-08-01 | 2150.0 | 2023-08-01 |
etc, this ranges up to 5 years, and 380+ rows, there are four types of customers (with their own respective tables set up similar as above): Annual paying, Bi-Annual paying, Quarterly Paying and Monthly paying.
I also have an empty dataframe (SumsOfPayments) with indexes based on variables which update each month and columns based on the above mentioned customer types.
looks like this:
Annual | Bi-Annual | Quarterly | Monthly | |
---|---|---|---|---|
12monthsago | ||||
11monthsago | ||||
10monthsago |
etc until until it hits 60 months into the future.
the indexes on the SumOfPayments and the YearXPaydays are all set to the 1st of their respective month, so they can == match.
(as an example of how the index variables are set on the SumOfPayments table):
12monthsago = datetime.today().replace(day=1,hour=0,minute=0).replace(second=0,microsecond=0)+relativedelta(months=-12)
so if todays date is 13/08/2021, the above would produce 2020-08-01 00:00:00.
What the intention behind this is to:
order the YearXPaydays by date, have a total of the sum of the YearXAmounts by that grouped date
from those grouped sums, check against the index on the SumOfPayments dataframe, and enter the sum wherever the dates match
example (based on the tables above)
AnnualContracts:
Year1Amount | Year1Payday | Year2Amount | Year2Payday | Year3Amount | Year3Payday |
---|---|---|---|---|---|
1000.0 | 2020-08-01 | 1000.0 | 2021-08-01 | 1000.0 | 2022-08-01 |
2400.0 | 2021-06-01 | 3400.0 | 2022-06-01 | 4400.0 | 2023-06-01 |
1259.0 | 2019-05-01 | 1259.0 | 2020-05-01 | 1259.0 | 2021-05-01 |
2150.0 | 2021-08-01 | 2150.0 | 2022-08-01 | 2150.0 | 2023-08-01 |
SumOfPayments:
Annual | Bi-Annual | Quarterly | Monthly | |
---|---|---|---|---|
12monthsago | 1000.0 | |||
11monthsago | ||||
10monthsago | ||||
9monthsago | ||||
8monthsago | ||||
7monthsago | ||||
6monthsago | ||||
5monthsago | ||||
4monthsago | ||||
3monthsago | 1259.0 | |||
2monthsago | 2400.0 | |||
1monthsago | ||||
currentmont | 3150.0 |
Any help on this would be massively appreciated, thanks in advance for any assistance.
You could use wide_to_long
if your column names were a little different. Instead I'll just split and melt them to get the data in the right shape. If you're curious what's happening, just print out dt
and amt
to see what they look like after melting.
Then you can create your output table using 13 periods (this month plus the past 12 months) and start it from the beginning of the month on year ago.
You can create multiple tables for each level of aggregation you want, annual, bi-annual, etc. Then just merge them to the table with the date range.
import pandas as pd
from datetime import date, timedelta, date
df = pd.DataFrame({'Year1Amount': {0: 1000.0, 1: 2400.0, 2: 1259.0, 3: 2150.0},
'Year1Payday': {0: '2020-08-01',
1: '2021-06-01',
2: '2019-05-01',
3: '2021-08-01'},
'Year2Amount': {0: 1000.0, 1: 3400.0, 2: 1259.0, 3: 2150.0},
'Year2Payday': {0: '2021-08-01',
1: '2022-06-01',
2: '2020-05-01',
3: '2022-08-01'},
'Year3Amount': {0: 1000.0, 1: 4400.0, 2: 1259.0, 3: 2150.0},
'Year3Payday': {0: '2022-08-01',
1: '2023-06-01',
2: '2021-05-01',
3: '2023-08-01'}})
hist = pd.DataFrame({'Date':pd.date_range(start=(date.today() - timedelta(days=365)).replace(day=1),
freq=pd.offsets.MonthBegin(),
periods=13)})
# Split and melt
dt = df[[x for x in df.columns if 'Payday' in x]].melt(value_name='Date')
amt = df[[x for x in df.columns if 'Amount' in x]].melt(value_name='Annual')
# Combine and make datetime
df = pd.concat([amt['Annual'], dt['Date']],axis=1)
df['Date'] = pd.to_datetime(df['Date'])
# Do all of your aggregations into new dataframes like such, you'll need one for each column
# here's how to do the annual one
annual_sum = df.groupby('Date', as_index=False).sum()
# For each aggregation, merge to the hist df
hist = hist.merge(annual_sum, on='Date', how='left')
Output
Date Annual
0 2020-08-01 1000.0
1 2020-09-01 NaN
2 2020-10-01 NaN
3 2020-11-01 NaN
4 2020-12-01 NaN
5 2021-01-01 NaN
6 2021-02-01 NaN
7 2021-03-01 NaN
8 2021-04-01 NaN
9 2021-05-01 1259.0
10 2021-06-01 2400.0
11 2021-07-01 NaN
12 2021-08-01 3150.0