I have a table in amazon redshift (datagrip) like so:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected |
---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/01/2022 | 10000 | 40000 |
10001 | 01/01/2022 | 12 | 01/02/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/03/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/04/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/05/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/06/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/07/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/08/2022 | 10000 | 30000 |
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/12/2022 | 10000 | 0 |
10002 | 01/01/2022 | 8 | 01/03/2022 | 5000 | 12000 |
10002 | 01/01/2022 | 8 | 01/04/2022 | 5000 | 1000 |
10002 | 01/01/2022 | 8 | 01/05/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/06/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/07/2022 | 5000 | 10000 |
10002 | 01/01/2022 | 8 | 01/08/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/09/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/10/2022 | 5000 | 0 |
Whereby each month I have a target amount and a payment (amount_collected), and each month I want to see how much of my target I've achieved (the achieved amount can never be higher than the target), if the payment exceeds the target then the amount achieved will trickle down to the next month. Like so:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected | Achieved | Overpayment |
---|---|---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/01/2022 | 10000 | 40000 | 10000 | 30000 |
10001 | 01/01/2022 | 12 | 01/02/2022 | 10000 | 0 | 10000 | 20000 |
10001 | 01/01/2022 | 12 | 01/03/2022 | 10000 | 0 | 10000 | 10000 |
10001 | 01/01/2022 | 12 | 01/04/2022 | 10000 | 0 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/05/2022 | 10000 | 0 | 0 | 0 |
10001 | 01/01/2022 | 12 | 01/06/2022 | 10000 | 0 | 0 | 0 |
10001 | 01/01/2022 | 12 | 01/07/2022 | 10000 | 0 | 0 | 0 |
10001 | 01/01/2022 | 12 | 01/08/2022 | 10000 | 30000 | 10000 | 20000 |
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 | 10000 | 12500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 | 10000 | 2500 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 | 2500 | 0 |
10001 | 01/01/2022 | 12 | 01/12/2022 | 10000 | 0 | 0 | 0 |
10002 | 01/01/2022 | 8 | 01/03/2022 | 5000 | 12000 | 5000 | 7000 |
10002 | 01/01/2022 | 8 | 01/04/2022 | 5000 | 1000 | 5000 | 3000 |
10002 | 01/01/2022 | 8 | 01/05/2022 | 5000 | 0 | 3000 | 0 |
10002 | 01/01/2022 | 8 | 01/06/2022 | 5000 | 0 | 0 | 0 |
10002 | 01/01/2022 | 8 | 01/07/2022 | 5000 | 10000 | 5000 | 5000 |
10002 | 01/01/2022 | 8 | 01/08/2022 | 5000 | 0 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/09/2022 | 5000 | 0 | 0 | 0 |
10002 | 01/01/2022 | 8 | 01/10/2022 | 5000 | 0 | 0 | 0 |
I can't use SUM() OVER(ORDER BY MONTH ROWS UNBOUNDED PRECEDING) because even though the achievements in May, June and July were missed, the achieved column only looks forward, not backwards. I think the best way to achieve this would be to have an Overpayment column which is like a rolling sum of the previous overpayment values - the previously achieved value BUT if the overpayment was previously 0 then we have a month where there is an overpayment the previously rolling sum needs to be discarded and the overpayment value reset to the payment-target.
Simply put, I need overpayments to trickle to down the achieved columns over the months taking into consideration the payments made until overpayments are finished, ignoring any months where the target wasn't achieved.
Interesting challenge! Probably one of the hardest SQL questions I've faced recently - so hopefully someone doesn't come along with a simpler/obvious solution :)
This appears to give the correct result based on the sample data provided. I've split it up into multiple CTEs to hopefully make the logic clearer; some of the CTEs could probably be combined:
-- Creates an identifier each time an amount is collected
-- and groups all subsequent records until the next amount is collected
WITH data1
AS (SELECT *,
SUM(CASE
WHEN amount_collected > 0 THEN 1
ELSE 0
END)
over (
PARTITION BY contract_id
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) AS TRACKER
FROM trickle)
-- Provides a cumulative sum of the target/collected/balance amounts
-- within each 'tracker' group. LAG(TRACKER) is needed for a subsequent step
,
data2
AS (SELECT *,
SUM(target)
over (
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) TARGET_RUNNING,
SUM(amount_collected)
over (
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) AMT_COLL_RUNNING,
SUM(amount_collected - target)
over (
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) BAL_RUNNING,
Lag(tracker)
over (
PARTITION BY contract_id
ORDER BY collection_due_date ASC) LAG_TRACKER
FROM data1)
-- BAL_RUNNING is calculated using a windows function so lag_run_bal needs to be
-- in another CTE as nested windows functions aren't allowed
,
data3
AS (SELECT *,
Lag(bal_running)
over (
PARTITION BY contract_id
ORDER BY collection_due_date ASC) LAG_RUN_BAL
FROM data2)
-- Calculates the available balance
,
data4
AS (SELECT *,
CASE
WHEN tracker <> lag_tracker
AND lag_run_bal > 0 THEN amt_coll_running + lag_run_bal
ELSE amt_coll_running
END AMT_COLL_RUNNING1
FROM data3)
SELECT contract_id,
collection_due_date,
target,
amount_collected,
CASE
WHEN target_running <= amt_coll_running1 THEN target
ELSE 0
END ACHIEVED,
Greatest(amt_coll_running1 - target_running, 0) OVERPAYMENT
FROM data4
ORDER BY contract_id,
collection_due_date ASC;
Updated Solution
I think there may be a typo in your output table:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected | Achieved | Overpayment |
---|---|---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 | 10000 | 12500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 | 2500 | 0 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 | 0 | 0 |
should be:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected | Achieved | Overpayment |
---|---|---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 | 10000 | 12500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 | 10000 | 2500 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 | 2500 | 0 |
Assuming that is a typo and my correction is correct then this SQL should give you what you want:
WITH DATA0 AS (
SELECT
*,
NVL(
LAG(AMOUNT_COLLECTED) OVER (
PARTITION BY CONTRACT_ID
ORDER BY
COLLECTION_DUE_DATE ASC
),
0
) LAG_AMT_COLL
FROM
TRICKLE
),
DATA1 AS (
SELECT
*,
SUM(
CASE WHEN amount_collected > 0
AND LAG_AMT_COLL = 0 THEN 1 ELSE 0 END
) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC ROWS BETWEEN unbounded preceding
AND CURRENT ROW
) AS TRACKER
FROM
DATA0
),
DATA2 AS (
SELECT
*,
SUM(TARGET) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) TARGET_RUNNING,
SUM(AMOUNT_COLLECTED) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AMT_COLL_RUNNING,
SUM(AMOUNT_COLLECTED - TARGET) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) BAL_RUNNING,
Lag(tracker) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC
) LAG_TRACKER
FROM
DATA1
),
DATA3 AS (
SELECT
*,
GREATEST(BAL_RUNNING, 0) BAL_RUNNING1
FROM
DATA2
)
SELECT
CONTRACT_ID,
STARTING_MONTH,
CONTRACT_DURATION_IN_MONTHS,
COLLECTION_DUE_DATE,
TARGET,
AMOUNT_COLLECTED,
LEAST(
TARGET,
NVL(
LAG(BAL_RUNNING1) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC
),
0
) + AMOUNT_COLLECTED
) ACHIEVED,
Greatest(
amt_coll_running - target_running,
0
) OVERPAYMENT
FROM
DATA3
ORDER BY
CONTRACT_ID,
COLLECTION_DUE_DATE