Search code examples
sqldb2db2-400db2-luw

Reset DENSE_RANK() OVER partition by where just one row is different


The set of data that I have has gap between dates and with Lag() function I calculated the number of months that gap exists. Then in second query with dense_rank() I count active Months for each ID. However, as you see in the table the count resets where there is a gap but after,it continues with the previous count instead of the reset number.

Could you please help?

SELECT
    GP, ID, Date, 
    DENSE_RANK() OVER (PARTITION BY ENR.GP, ENR.ID, Age, ENR.DATEDIFF 
                       ORDER BY ENR.DATE ASC) AS Active_MOS
FROM
    (SELECT 
         GP, ID, 
         MONTHS_BETWEEN (DATE, LAG(DATE, 1) OVER (PARTITION BY GP, ID, CASE WHEN GNDR = 'M' AND AGE < 35 THEN 'YES' ELSE 'NO' END ORDER BY YRMO ASC )) AS DateDiff
     FROM 
         TABLE2)     

enter image description here


Solution

  • You have to give a number to every island and put it in the partition of dense_rank. If you set a value of 1 when a row doesn't follow the preceding one then the cumulative sum of that value can be used as the number of the island :

    with
    --table2 (gp, id, age, date) as (
    --  values
    --  (33, 1375, 1, date '2021-05-01'),
    --  (33, 1375, 2, date '2021-06-01'),
    --  (33, 1375, 2, date '2021-07-01'),
    --  (33, 1375, 2, date '2021-08-01'),
    --  (33, 1375, 2, date '2021-09-01'),
    --  (33, 1375, 2, date '2021-10-01'),
    --  (33, 1375, 2, date '2021-11-01'),
    --  (33, 1375, 2, date '2021-12-01'),
    --  (33, 1375, 2, date '2022-01-01'),
    --  (33, 1375, 2, date '2022-02-01'),
    --  (33, 1375, 2, date '2022-03-01'),
    --  (33, 1375, 2, date '2022-04-01'),
    --  (33, 1375, 2, date '2022-05-01'),
    --  (33, 1375, 2, date '2022-06-01'),
    --  (33, 1375, 2, date '2022-07-01'),
    --  (33, 1375, 2, date '2022-08-01'),
    --  (33, 1375, 2, date '2022-09-01'),
    --  (33, 1375, 2, date '2023-05-01'),
    --  (33, 1375, 2, date '2023-06-01'),
    --  (33, 1375, 2, date '2023-07-01')
    --),
    islands as (
      select
        t2.*,
        sum(
          case when t2."DATE" - 1 month
                    > lag(t2.date, 1) over(partition by gp, id, age order by date)
               then 1 else 0 end
        ) over(partition by gp, id, age order by date) island_no
      from table2 t2
    )
    select
      islands.*,
      dense_rank() over(partition by gp, id, age, island_no order by date) rank
    from islands