Search code examples
pythonpandasdatetimepandas-groupbysumifs

Populating empty data frame from conditional checks on a table


I’m looking for some bearings on how to approach this problem.

I have a table ‘AnnualContracts’

Of which, there is the following types of data:

Year1Amount Year1Payday Year2Amount Year2Payday Year3Amount Year3Payday
1000.0 2020-08-01 1000.0 2021-08-01 1000.0 2022-08-01
2400.0 2021-06-01 3400.0 2022-06-01 4400.0 2023-06-01
1259.0 2019-05-01 1259.0 2020-05-01 1259.0 2021-05-01
2150.0 2021-08-01 2150.0 2022-08-01 2150.0 2023-08-01

etc, this ranges up to 5 years, and 380+ rows, there are four types of customers (with their own respective tables set up similar as above): Annual paying, Bi-Annual paying, Quarterly Paying and Monthly paying.

I also have an empty dataframe (SumsOfPayments) with indexes based on variables which update each month and columns based on the above mentioned customer types.

looks like this:

Annual Bi-Annual Quarterly Monthly
12monthsago
11monthsago
10monthsago

etc until until it hits 60 months into the future.

the indexes on the SumOfPayments and the YearXPaydays are all set to the 1st of their respective month, so they can == match.

(as an example of how the index variables are set on the SumOfPayments table):

12monthsago = datetime.today().replace(day=1,hour=0,minute=0).replace(second=0,microsecond=0)+relativedelta(months=-12)

so if todays date is 13/08/2021, the above would produce 2020-08-01 00:00:00.

What the intention behind this is to:

  1. order the YearXPaydays by date, have a total of the sum of the YearXAmounts by that grouped date

  2. from those grouped sums, check against the index on the SumOfPayments dataframe, and enter the sum wherever the dates match

example (based on the tables above)

AnnualContracts:

Year1Amount Year1Payday Year2Amount Year2Payday Year3Amount Year3Payday
1000.0 2020-08-01 1000.0 2021-08-01 1000.0 2022-08-01
2400.0 2021-06-01 3400.0 2022-06-01 4400.0 2023-06-01
1259.0 2019-05-01 1259.0 2020-05-01 1259.0 2021-05-01
2150.0 2021-08-01 2150.0 2022-08-01 2150.0 2023-08-01

SumOfPayments:

Annual Bi-Annual Quarterly Monthly
12monthsago 1000.0
11monthsago
10monthsago
9monthsago
8monthsago
7monthsago
6monthsago
5monthsago
4monthsago
3monthsago 1259.0
2monthsago 2400.0
1monthsago
currentmont 3150.0

Any help on this would be massively appreciated, thanks in advance for any assistance.


Solution

  • You could use wide_to_long if your column names were a little different. Instead I'll just split and melt them to get the data in the right shape. If you're curious what's happening, just print out dt and amt to see what they look like after melting.

    Then you can create your output table using 13 periods (this month plus the past 12 months) and start it from the beginning of the month on year ago.

    You can create multiple tables for each level of aggregation you want, annual, bi-annual, etc. Then just merge them to the table with the date range.

    import pandas as pd
    from datetime import date, timedelta, date
    
    df = pd.DataFrame({'Year1Amount': {0: 1000.0, 1: 2400.0, 2: 1259.0, 3: 2150.0},
     'Year1Payday': {0: '2020-08-01',
      1: '2021-06-01',
      2: '2019-05-01',
      3: '2021-08-01'},
     'Year2Amount': {0: 1000.0, 1: 3400.0, 2: 1259.0, 3: 2150.0},
     'Year2Payday': {0: '2021-08-01',
      1: '2022-06-01',
      2: '2020-05-01',
      3: '2022-08-01'},
     'Year3Amount': {0: 1000.0, 1: 4400.0, 2: 1259.0, 3: 2150.0},
     'Year3Payday': {0: '2022-08-01',
      1: '2023-06-01',
      2: '2021-05-01',
      3: '2023-08-01'}})
    
    hist = pd.DataFrame({'Date':pd.date_range(start=(date.today() - timedelta(days=365)).replace(day=1),
                                              freq=pd.offsets.MonthBegin(),
                                              periods=13)})
    
    
    
    # Split and melt
    dt = df[[x for x in df.columns if 'Payday' in x]].melt(value_name='Date')
    amt = df[[x for x in df.columns if 'Amount' in x]].melt(value_name='Annual')
    
    # Combine and make datetime
    df = pd.concat([amt['Annual'], dt['Date']],axis=1)
    df['Date'] = pd.to_datetime(df['Date'])
    
    
    # Do all of your aggregations into new dataframes like such, you'll need one for each column
    # here's how to do the annual one
    annual_sum = df.groupby('Date', as_index=False).sum()
    
    # For each aggregation, merge to the hist df
    hist = hist.merge(annual_sum, on='Date', how='left')
    

    Output

             Date  Annual
    0  2020-08-01  1000.0
    1  2020-09-01     NaN
    2  2020-10-01     NaN
    3  2020-11-01     NaN
    4  2020-12-01     NaN
    5  2021-01-01     NaN
    6  2021-02-01     NaN
    7  2021-03-01     NaN
    8  2021-04-01     NaN
    9  2021-05-01  1259.0
    10 2021-06-01  2400.0
    11 2021-07-01     NaN
    12 2021-08-01  3150.0