In the table below, 'ranking' was implemented with the RANK()
function, with something like:
RANK() OVER(PARTITION BY month, year) as ranking
month | year | car_type | model | sales | ranking |
---|---|---|---|---|---|
4 | 2020 | mercedes | s_class | 2000 | 1 |
5 | 2020 | mercedes | s_class | 1500 | 2 |
10 | 2020 | bmw | x5 | 7000 | 1 |
11 | 2020 | bmw | x5 | 6000 | 2 |
12 | 2020 | bmw | x5 | 5000 | 3 |
The desired outcome is to create a feature, e.g. 'best_sales_month', which will be equal to the number of the 'month' column where 'ranking'=1 for every respective distinct pair of 'car_type' and 'model'. It will look like this:
month | year | car_type | model | sales | ranking | best_sales_month |
---|---|---|---|---|---|---|
4 | 2020 | mercedes | s_class | 2000 | 1 | 4 |
5 | 2020 | mercedes | s_class | 1500 | 2 | 4 |
10 | 2020 | bmw | x5 | 7000 | 1 | 10 |
11 | 2020 | bmw | x5 | 6000 | 2 | 10 |
12 | 2020 | bmw | x5 | 5000 | 3 | 10 |
For example, for BMW x5 cars, 'best_sales_month' is 10 because sales are greater when month=10, for that pair of car_type and model.
Currently, I've reached this point:
CASE
-- when ranking=1, grab the value of 'month' for that entry:
WHEN ranking =1 THEN month
-- how to populate that number to the rest of the car_type & model pairs?
END AS best_sales_month
which ends up with this:
month | year | car_type | model | sales | ranking | best_sales_month |
---|---|---|---|---|---|---|
4 | 2020 | mercedes | s_class | 2000 | 1 | 4 |
5 | 2020 | mercedes | s_class | 1500 | 2 | NULL |
10 | 2020 | bmw | x5 | 7000 | 1 | 10 |
11 | 2020 | bmw | x5 | 6000 | 2 | NULL |
12 | 2020 | bmw | x5 | 5000 | 3 | NULL |
So essentially, how can I populate the NULL value rows with the 'month' value when 'ranking'=1, for every car_type and model pair?
Thanks in advance!
You may use first_value(expr)
analytic function that takes the first value of the expression expr
according to the ordering provided in analytical clause:
with input_tab(month, year, car_type, model, sales, ranking) as ( select 4, 2020, 'mercedes', 's_class', 2000, 1 union all select 5, 2020, 'mercedes', 's_class', 1500, 2 union all select 10, 2020, 'bmw', 'x5', 7000, 1 union all select 11, 2020, 'bmw', 'x5', 6000, 2 union all select 12, 2020, 'bmw', 'x5', 5000, 3 ) select *, first_value(month) over(partition by year, car_type, model order by sales desc) as best_sales_mon from input_tab
month | year | car_type | model | sales | ranking | best_sales_mon ----: | ---: | :------- | :------ | ----: | ------: | -------------: 10 | 2020 | bmw | x5 | 7000 | 1 | 10 11 | 2020 | bmw | x5 | 6000 | 2 | 10 12 | 2020 | bmw | x5 | 5000 | 3 | 10 4 | 2020 | mercedes | s_class | 2000 | 1 | 4 5 | 2020 | mercedes | s_class | 1500 | 2 | 4
Note, that it's the same for most of modern DBMS, so I've used Postgres for db<>fiddle