I have the below data which contains monthly targets for a set of ids. The targets are for each id, for every month in 2020. The table named targets
. The month
column indicates the month of year.
+-------+-------+----+--------+
| month | name | id | target |
+-------+-------+----+--------+
| 1 | Comp1 | 1 | 6000 |
+-------+-------+----+--------+
| 2 | Comp1 | 1 | 6000 |
+-------+-------+----+--------+
| 3 | Comp1 | 1 | 6000 |
+-------+-------+----+--------+
| 1 | Comp2 | 2 | 6000 |
+-------+-------+----+--------+
| 2 | Comp2 | 2 | 6000 |
+-------+-------+----+--------+
| 3 | Comp2 | 2 | 6000 |
+-------+-------+----+--------+
| 1 | Comp3 | 3 | 6000 |
+-------+-------+----+--------+
| 2 | Comp3 | 3 | 6000 |
+-------+-------+----+--------+
| 3 | Comp3 | 3 | 6000 |
+-------+-------+----+--------+
| 1 | Comp4 | 4 | 6000 |
+-------+-------+----+--------+
| 2 | Comp4 | 4 | 6000 |
+-------+-------+----+--------+
| 3 | Comp4 | 4 | 6000 |
+-------+-------+----+--------+
I then have a second table which contains daily data for a set of ids and is updated daily. In my actual dataset, I've got data from 2019-01-01 up until current day.
+------------+-------+----+--------+--------+
| yyyy_mm_dd | name | id | actual | region |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp1 | 1 | 1000 | LATAM |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp1 | 1 | 0 | EU |
+-------------------------------------------+
| 2019-01-02 | Comp1 | 1 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-03 | Comp1 | 1 | 4000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp2 | 2 | 1000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-02 | Comp2 | 2 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-03 | Comp2 | 2 | 3000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp3 | 3 | 1000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-02 | Comp3 | 3 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-03 | Comp3 | 3 | 8000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-01 | Comp4 | 4 | 1000 | EU |
+------------+-------+----+--------+--------+
| 2019-01-02 | Comp4 | 4 | 2000 | EU |
+------------+-------+----+--------+--------+
| 2019-02-03 | Comp4 | 4 | 3000 | EU |
+------------+-------+----+--------+--------+
Based on the above two tables, I want to create a third table with some additional logic. Ultimately, I want to have a new column introduced called payment
. This column should always be 0 unless the company has passed their monthly target. If the monthly target is met/passed, the payout should then be sum actual for that month - monthly target for that month * 1%
.
Here is how the output data might look:
+------------+-------+----+--------+--------+
| yyyy_mm_dd | name | id | actual | payout |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp1 | 1 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp1 | 1 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-03 | Comp1 | 1 | 4000 | 10 |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp2 | 2 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp2 | 2 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-03 | Comp2 | 2 | 3000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp3 | 3 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp3 | 3 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-03 | Comp3 | 3 | 8000 | 50 |
+------------+-------+----+--------+--------+
| 2020-01-01 | Comp4 | 4 | 1000 | 0 |
+------------+-------+----+--------+--------+
| 2020-01-02 | Comp4 | 4 | 2000 | 0 |
+------------+-------+----+--------+--------+
| 2020-02-03 | Comp4 | 4 | 3000 | 0 |
+------------+-------+----+--------+--------+
All names/ids in the above dataset have a monthly target
of 6000. So there should only be a payout
when a name/id passes that target during the month. Comp1 and Comp3 both passed the monthly target on the third day of January so they get a payout from that day onwards until the end of the month. This then resets in February as it's a new month with a new target and we'll get new daily data as the month progresses.
What I've tried:
SELECT
agg.yyyy_mm_dd,
agg.name,
agg.id,
CASE WHEN agg.actual >= targets.target THEN ((agg.actual-targets.target)/100) * 1 ELSE 0 END AS payout
FROM(
SELECT
sum(x.actual) AS actual,
x.yyyy_mm_dd,
x.name,
x.id
FROM(
SELECT
yyyy_mm_dd,
name,
id,
cast(actual as int) as actual
FROM
schema.daily_data
WHERE
yyyy_mm_dd >= '2020-01-01' AND (name = 'Comp1' OR name = 'Comp2')
) x
GROUP BY
2,3,4
) agg
INNER JOIN(
SELECT
id,
month,
target
FROM
schema.targets
) targets ON targets.id = agg.id
GROUP BY
1,2,3,4
However, the above outputs multiple rows per name
. This is a result of the daily table having the same company multiple times per day (expected). I thought my grouping would have handled that. Also, I don't think it's the most simple solution overall and I'm probably overthinking it / could be done more efficiently.
I think I've now got a working solution. The below is giving the output as expected. It could probably be optimized a bit as it isn't the quickest.
SELECT
x.yyyy_mm_dd,
x.id,
x.name,
x.actual,
x.target,
x.actual_to_date,
CASE WHEN x.actual_to_date > x.target THEN ((x.actual_to_date - x.target) /100) * 1 ELSE 0 END AS payout
FROM(
SELECT
daily.yyyy_mm_dd,
daily.id,
daily.name,
daily.actual,
t.target,
SUM(daily.actual) OVER (PARTITION BY MONTH(daily.yyyy_mm_dd), daily.id ORDER BY daily.yyyy_mm_dd RANGE UNBOUNDED PRECEDING) AS actual_to_date
FROM(
SELECT
yyyy_mm_dd,
id,
name,
sum(cast(actual as int)) as actual
FROM
daily_data_table
WHERE
yyyy_mm_dd >= '2020-01-01'
GROUP BY
1,2,3
) daily
INNER JOIN
monthly_target_table t
ON t.id = daily.id AND t.month = month(daily.yyyy_mm_dd)
WHERE
daily.name = 'Comp1'
) x