Search code examples
sqlamazon-web-servicesamazon-redshiftdatagrip

SQL - Trickle down an overpayment to the following months


I have a table in amazon redshift (datagrip) like so:

Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected
10001 01/01/2022 12 01/01/2022 10000 40000
10001 01/01/2022 12 01/02/2022 10000 0
10001 01/01/2022 12 01/03/2022 10000 0
10001 01/01/2022 12 01/04/2022 10000 0
10001 01/01/2022 12 01/05/2022 10000 0
10001 01/01/2022 12 01/06/2022 10000 0
10001 01/01/2022 12 01/07/2022 10000 0
10001 01/01/2022 12 01/08/2022 10000 30000
10001 01/01/2022 12 01/09/2022 10000 2500
10001 01/01/2022 12 01/10/2022 10000 0
10001 01/01/2022 12 01/11/2022 10000 0
10001 01/01/2022 12 01/12/2022 10000 0
10002 01/01/2022 8 01/03/2022 5000 12000
10002 01/01/2022 8 01/04/2022 5000 1000
10002 01/01/2022 8 01/05/2022 5000 0
10002 01/01/2022 8 01/06/2022 5000 0
10002 01/01/2022 8 01/07/2022 5000 10000
10002 01/01/2022 8 01/08/2022 5000 0
10002 01/01/2022 8 01/09/2022 5000 0
10002 01/01/2022 8 01/10/2022 5000 0

Whereby each month I have a target amount and a payment (amount_collected), and each month I want to see how much of my target I've achieved (the achieved amount can never be higher than the target), if the payment exceeds the target then the amount achieved will trickle down to the next month. Like so:

Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected Achieved Overpayment
10001 01/01/2022 12 01/01/2022 10000 40000 10000 30000
10001 01/01/2022 12 01/02/2022 10000 0 10000 20000
10001 01/01/2022 12 01/03/2022 10000 0 10000 10000
10001 01/01/2022 12 01/04/2022 10000 0 10000 0
10001 01/01/2022 12 01/05/2022 10000 0 0 0
10001 01/01/2022 12 01/06/2022 10000 0 0 0
10001 01/01/2022 12 01/07/2022 10000 0 0 0
10001 01/01/2022 12 01/08/2022 10000 30000 10000 20000
10001 01/01/2022 12 01/09/2022 10000 2500 10000 12500
10001 01/01/2022 12 01/10/2022 10000 0 10000 2500
10001 01/01/2022 12 01/11/2022 10000 0 2500 0
10001 01/01/2022 12 01/12/2022 10000 0 0 0
10002 01/01/2022 8 01/03/2022 5000 12000 5000 7000
10002 01/01/2022 8 01/04/2022 5000 1000 5000 3000
10002 01/01/2022 8 01/05/2022 5000 0 3000 0
10002 01/01/2022 8 01/06/2022 5000 0 0 0
10002 01/01/2022 8 01/07/2022 5000 10000 5000 5000
10002 01/01/2022 8 01/08/2022 5000 0 5000 0
10002 01/01/2022 8 01/09/2022 5000 0 0 0
10002 01/01/2022 8 01/10/2022 5000 0 0 0

I can't use SUM() OVER(ORDER BY MONTH ROWS UNBOUNDED PRECEDING) because even though the achievements in May, June and July were missed, the achieved column only looks forward, not backwards. I think the best way to achieve this would be to have an Overpayment column which is like a rolling sum of the previous overpayment values - the previously achieved value BUT if the overpayment was previously 0 then we have a month where there is an overpayment the previously rolling sum needs to be discarded and the overpayment value reset to the payment-target.

Simply put, I need overpayments to trickle to down the achieved columns over the months taking into consideration the payments made until overpayments are finished, ignoring any months where the target wasn't achieved.


Solution

  • Interesting challenge! Probably one of the hardest SQL questions I've faced recently - so hopefully someone doesn't come along with a simpler/obvious solution :)

    This appears to give the correct result based on the sample data provided. I've split it up into multiple CTEs to hopefully make the logic clearer; some of the CTEs could probably be combined:

    -- Creates an identifier each time an amount is collected 
    -- and groups all subsequent records until the next amount is collected
    WITH data1
         AS (SELECT *,
                    SUM(CASE
                          WHEN amount_collected > 0 THEN 1
                          ELSE 0
                        END)
                      over (
                        PARTITION BY contract_id
                        ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                      preceding
                      AND
                      CURRENT ROW) AS TRACKER
             FROM   trickle)
    -- Provides a cumulative sum of the target/collected/balance amounts 
    -- within each 'tracker' group. LAG(TRACKER) is needed for a subsequent step
    ,
         data2
         AS (SELECT *,
                    SUM(target)
                      over (
                        PARTITION BY contract_id, tracker
                        ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                      preceding
                      AND
                      CURRENT ROW)                        TARGET_RUNNING,
                    SUM(amount_collected)
                      over (
                        PARTITION BY contract_id, tracker
                        ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                      preceding
                      AND
                      CURRENT ROW)                        AMT_COLL_RUNNING,
                    SUM(amount_collected - target)
                      over (
                        PARTITION BY contract_id, tracker
                        ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
                      preceding
                      AND
                      CURRENT ROW)                        BAL_RUNNING,
                    Lag(tracker)
                      over (
                        PARTITION BY contract_id
                        ORDER BY collection_due_date ASC) LAG_TRACKER
             FROM   data1)
    -- BAL_RUNNING is calculated using a windows function so lag_run_bal needs to be 
    -- in another CTE as nested windows functions aren't allowed 
    ,
         data3
         AS (SELECT *,
                    Lag(bal_running)
                      over (
                        PARTITION BY contract_id
                        ORDER BY collection_due_date ASC) LAG_RUN_BAL
             FROM   data2)
    -- Calculates the available balance
    ,
         data4
         AS (SELECT *,
                    CASE
                      WHEN tracker <> lag_tracker
                           AND lag_run_bal > 0 THEN amt_coll_running + lag_run_bal
                      ELSE amt_coll_running
                    END AMT_COLL_RUNNING1
             FROM   data3)
    SELECT contract_id,
           collection_due_date,
           target,
           amount_collected,
           CASE
             WHEN target_running <= amt_coll_running1 THEN target
             ELSE 0
           END                                             ACHIEVED,
           Greatest(amt_coll_running1 - target_running, 0) OVERPAYMENT
    FROM   data4
    ORDER  BY contract_id,
              collection_due_date ASC; 
    

    Updated Solution

    I think there may be a typo in your output table:

    Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected Achieved Overpayment
    10001 01/01/2022 12 01/09/2022 10000 2500 10000 12500
    10001 01/01/2022 12 01/10/2022 10000 0 2500 0
    10001 01/01/2022 12 01/11/2022 10000 0 0 0

    should be:

    Contract_ID Starting_Month Contract_Duration_In_Months Collection_Due_Date Target Amount_Collected Achieved Overpayment
    10001 01/01/2022 12 01/09/2022 10000 2500 10000 12500
    10001 01/01/2022 12 01/10/2022 10000 0 10000 2500
    10001 01/01/2022 12 01/11/2022 10000 0 2500 0

    Assuming that is a typo and my correction is correct then this SQL should give you what you want:

    WITH DATA0 AS (
      SELECT 
        *, 
        NVL(
          LAG(AMOUNT_COLLECTED) OVER (
            PARTITION BY CONTRACT_ID 
            ORDER BY 
              COLLECTION_DUE_DATE ASC
          ), 
          0
        ) LAG_AMT_COLL 
      FROM 
        TRICKLE
    ), 
    DATA1 AS (
      SELECT 
        *, 
        SUM(
          CASE WHEN amount_collected > 0 
          AND LAG_AMT_COLL = 0 THEN 1 ELSE 0 END
        ) over (
          PARTITION BY contract_id 
          ORDER BY 
            collection_due_date ASC ROWS BETWEEN unbounded preceding 
            AND CURRENT ROW
        ) AS TRACKER 
      FROM 
        DATA0
    ), 
    DATA2 AS (
      SELECT 
        *, 
        SUM(TARGET) OVER (
          PARTITION BY CONTRACT_ID, 
          TRACKER 
          ORDER BY 
            COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING 
            AND CURRENT ROW
        ) TARGET_RUNNING, 
        SUM(AMOUNT_COLLECTED) OVER (
          PARTITION BY CONTRACT_ID, 
          TRACKER 
          ORDER BY 
            COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING 
            AND CURRENT ROW
        ) AMT_COLL_RUNNING, 
        SUM(AMOUNT_COLLECTED - TARGET) OVER (
          PARTITION BY CONTRACT_ID, 
          TRACKER 
          ORDER BY 
            COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING 
            AND CURRENT ROW
        ) BAL_RUNNING, 
        Lag(tracker) over (
          PARTITION BY contract_id 
          ORDER BY 
            collection_due_date ASC
        ) LAG_TRACKER 
      FROM 
        DATA1
    ), 
    DATA3 AS (
      SELECT 
        *, 
        GREATEST(BAL_RUNNING, 0) BAL_RUNNING1 
      FROM 
        DATA2
    ) 
    SELECT 
      CONTRACT_ID, 
      STARTING_MONTH, 
      CONTRACT_DURATION_IN_MONTHS, 
      COLLECTION_DUE_DATE, 
      TARGET, 
      AMOUNT_COLLECTED, 
      LEAST(
        TARGET, 
        NVL(
          LAG(BAL_RUNNING1) over (
            PARTITION BY contract_id 
            ORDER BY 
              collection_due_date ASC
          ), 
          0
        ) + AMOUNT_COLLECTED
      ) ACHIEVED, 
      Greatest(
        amt_coll_running - target_running, 
        0
      ) OVERPAYMENT 
    FROM 
      DATA3 
    ORDER BY 
      CONTRACT_ID, 
      COLLECTION_DUE_DATE