Search code examples
mysqlwindow-functionsmysql-8.0

How to perform window function with where clause?


From the columns 1-4 in the table below I'd like to create a query that can return columns 5 & 6:

my_date cat_1_id cat_2_id my_value cat_2_id_1_value cat_2_id_2_value
01/01/2024 1 1 1 1 Null
01/01/2024 2 1 2 2 Null
02/01/2024 1 2 3 1 3
02/01/2024 2 2 4 2 4
03/01/2024 1 1 5 5 3
03/01/2024 2 1 6 6 4

The logic of the query:

This is in essence a pivoting of data contained within columns 1-4.

Column 5:

  • If cat_2_id = 1 then display my_value for that row
  • If cat_2_id != 1 then display my_value for the last row where cat_id_1 matches the current row and cat_2_id = 1

Column 6:

This is the same principle as column 5 just for cat_2_id = 2. So:

  • If cat_2_id = 2 then display my_value for that row
  • If cat_2_id != 2 then display my_value for the last row where cat_id_1 matches the current row and cat_2_id = 2

Current progress:

I've got as far as a solution that involves CASE to:

  • get my_value if cat_2_id condition is met, or;
  • get my_value for the last row using LAST_VALUE window function along with a combination of PARTITION BY (cat_1_id) and ORDER BY (my_date) and RANGE BETWEEN.

However, I can't get my head around how to build in the additional filter within the window function to ensure the cat_2_id condition is met for the last row.

I found this post that uses FILTER for postgresql but I couldn't find this in the MySQL docs (plus it says it isn't implemented for non-aggregate window functions).

If there isn't a window function way of doing this then I'd be happy with anything that does the job!

Here's the data to recreate:

CREATE TABLE test.pivot_cols (
    pivot_by_cols_id INT AUTO_INCREMENT PRIMARY KEY,
    my_date DATE NOT NULL, 
    cat_1_id INT NOT NULL, 
    cat_2_id INT NOT NULL, 
    my_value INT NOT NULL
);
INSERT INTO `test`.`pivot_cols` (`my_date`, `cat_1_id`, `cat_2_id`, `my_value`) VALUES ('2000-01-01', '1', '1', '1');
INSERT INTO `test`.`pivot_cols` (`my_date`, `cat_1_id`, `cat_2_id`, `my_value`) VALUES ('2000-01-01', '2', '1', '2');
INSERT INTO `test`.`pivot_cols` (`my_date`, `cat_1_id`, `cat_2_id`, `my_value`) VALUES ('2000-01-02', '1', '2', '3');
INSERT INTO `test`.`pivot_cols` (`my_date`, `cat_1_id`, `cat_2_id`, `my_value`) VALUES ('2000-01-02', '2', '2', '4');
INSERT INTO `test`.`pivot_cols` (`my_date`, `cat_1_id`, `cat_2_id`, `my_value`) VALUES ('2000-01-02', '1', '1', '5');
INSERT INTO `test`.`pivot_cols` (`my_date`, `cat_1_id`, `cat_2_id`, `my_value`) VALUES ('2000-01-02', '2', '1', '6');

Solution

  • You may begin with a correlated subquery which is easier to implement and understand:

    select *, case when cat_2_id = 1 then my_value else (
      select my_value
      from pivot_cols as x
      where cat_1_id = pivot_cols.cat_1_id
      and cat_2_id = 1
      and my_date < pivot_cols.my_date
      order by my_date desc
      limit 1
    ) end as id_1_value, case when cat_2_id = 2 then my_value else (
      select my_value
      from pivot_cols as x
      where cat_1_id = pivot_cols.cat_1_id
      and cat_2_id = 2
      and my_date < pivot_cols.my_date
      order by my_date desc
      limit 1
    ) end as id_2_value
    from pivot_cols
    order by my_date, cat_1_id
    

    If you must use window functions, it could have been straight forward if MySQL supported ignore nulls in window functions. Sadly it doesn't so you need to use some trick. In the following query I use max() over () to find the prior date conditionally. Additional joins are used to fetch the values:

    with cte as (
      select
        *,
        max(case when cat_2_id = 1 then my_date end) over prev_rows as date_1,
        max(case when cat_2_id = 2 then my_date end) over prev_rows as date_2
      from pivot_cols
      window prev_rows as (
        partition by cat_1_id
        order by my_date rows between unbounded preceding and 1 preceding
      )
    )
    select
      cte.*,
      case when cte.cat_2_id = 1 then cte.my_value else j1.my_value end as id_1_value,
      case when cte.cat_2_id = 2 then cte.my_value else j2.my_value end as id_2_value
    from cte
    left join pivot_cols as j1 on cte.cat_1_id = j1.cat_1_id and cte.date_1 = j1.my_date 
    left join pivot_cols as j2 on cte.cat_1_id = j2.cat_1_id and cte.date_2 = j2.my_date
    order by cte.my_date, cte.cat_1_id
    

    Demo on DB<>Fiddle