I'm working on HIVE.
I have the following data:
Date | WholeBudget | Target PocketA | PocketA | PocketB |
---|---|---|---|---|
2024-01-01 | 60 | 75% | 45 | 15 |
2024-01-02 | 80 | 75% | 60 | 20 |
2024-01-03 | 100 | 50% | 60 | 40 |
2024-01-04 | 110 | 50% | 60 | 50 |
2024-01-05 | 120 | 50% | 60 | 60 |
2024-01-06 | 130 | 50% | 65 | 65 |
2024-01-07 | 140 | 50% | 70 | 70 |
Date is an input on already existing data.
WholeBudget and Target PocketA are also inputs (actually already computed values using LAG shenanigans).
PoketA is the goal. Its formula is: PocketA(d) = MAX(PocketA(d-1), WholeBudget(d) x Target(d))
where d is the date on the data line.
PocketB is simply the remainder: PocketB(d) = WholeBudget(d) - PocketA(d)
Whole budget is strictly increasing at varying speed.
Pockets are not allowed to decrease (thus the max in PocketA formula).
Target PocketA can decrease. That's why, sometimes, PocketA is above target and waits for PocketB to catch on.
I don't think PocketA is doable with LAG() as it is referencing the previous value as part of the computation.
I'm working on a WHILE loop over all the dates to get it. Although it's an acceptable solution as the number of dates is capped by the current year (max ~200 working days), it still hurts my soul.
Is there a smarter solution working on hive?
To clarify as requested:
Here is the provided data to reproduce:
Date | WholeBudget | Target PocketA |
---|---|---|
2024-01-01 | 60 | 75% |
2024-01-02 | 80 | 75% |
2024-01-03 | 100 | 50% |
2024-01-04 | 110 | 50% |
2024-01-05 | 120 | 50% |
2024-01-06 | 130 | 50% |
2024-01-07 | 140 | 50% |
And here is the expected result:
Date | PocketA | PocketB |
---|---|---|
2024-01-01 | 45 | 15 |
2024-01-02 | 60 | 20 |
2024-01-03 | 60 | 40 |
2024-01-04 | 60 | 50 |
2024-01-05 | 60 | 60 |
2024-01-06 | 65 | 65 |
2024-01-07 | 70 | 70 |
I'd calculate it naively (Budget * Target) and then use a window function to look back at ALL previous results (including the current row) and take the max.
WITH
calc_pa AS
(
SELECT
date,
WholeBudget,
Target,
MAX(WholeBudget * Target) OVER (ORDER BY date) AS PocketA
FROM
yourTable
)
SELECT
*,
WholeBudget - PocketA AS PocketB
FROM
calc_pa