Search code examples
sqlgoogle-bigqueryconditional-statementspartitioningranking

How to Insert a new Column in SQL based on Window Function Ranking and Condition


In the table below, 'ranking' was implemented with the RANK() function, with something like:

RANK() OVER(PARTITION BY month, year) as ranking
month year car_type model sales ranking
4 2020 mercedes s_class 2000 1
5 2020 mercedes s_class 1500 2
10 2020 bmw x5 7000 1
11 2020 bmw x5 6000 2
12 2020 bmw x5 5000 3

The desired outcome is to create a feature, e.g. 'best_sales_month', which will be equal to the number of the 'month' column where 'ranking'=1 for every respective distinct pair of 'car_type' and 'model'. It will look like this:

month year car_type model sales ranking best_sales_month
4 2020 mercedes s_class 2000 1 4
5 2020 mercedes s_class 1500 2 4
10 2020 bmw x5 7000 1 10
11 2020 bmw x5 6000 2 10
12 2020 bmw x5 5000 3 10

For example, for BMW x5 cars, 'best_sales_month' is 10 because sales are greater when month=10, for that pair of car_type and model.

Currently, I've reached this point:

CASE 
  -- when ranking=1, grab the value of 'month' for that entry:
  WHEN ranking =1 THEN month 
  -- how to populate that number to the rest of the car_type & model pairs?
END AS best_sales_month 

which ends up with this:

month year car_type model sales ranking best_sales_month
4 2020 mercedes s_class 2000 1 4
5 2020 mercedes s_class 1500 2 NULL
10 2020 bmw x5 7000 1 10
11 2020 bmw x5 6000 2 NULL
12 2020 bmw x5 5000 3 NULL

So essentially, how can I populate the NULL value rows with the 'month' value when 'ranking'=1, for every car_type and model pair?

Thanks in advance!


Solution

  • You may use first_value(expr) analytic function that takes the first value of the expression expr according to the ordering provided in analytical clause:

    with input_tab(month, year, car_type, model, sales, ranking) as (
    select 4, 2020, 'mercedes', 's_class', 2000, 1 union all
    select 5, 2020, 'mercedes', 's_class', 1500, 2 union all
    select 10, 2020, 'bmw', 'x5', 7000, 1 union all
    select 11, 2020, 'bmw', 'x5', 6000, 2 union all
    select 12, 2020, 'bmw', 'x5', 5000, 3
    )
    
    select
      *,
      first_value(month)
        over(partition by year, car_type, model order by sales desc) as best_sales_mon
    from input_tab
    
    month | year | car_type | model   | sales | ranking | best_sales_mon
    ----: | ---: | :------- | :------ | ----: | ------: | -------------:
       10 | 2020 | bmw      | x5      |  7000 |       1 |             10
       11 | 2020 | bmw      | x5      |  6000 |       2 |             10
       12 | 2020 | bmw      | x5      |  5000 |       3 |             10
        4 | 2020 | mercedes | s_class |  2000 |       1 |              4
        5 | 2020 | mercedes | s_class |  1500 |       2 |              4
    

    Note, that it's the same for most of modern DBMS, so I've used Postgres for db<>fiddle