Search code examples
sqlsql-servergroup-bysubquerywindow-functions

Group rows by dense_rank() and loop through each sub-group and compare another column in next row of that sub group?


I have tried the following in LINQPad:

create table users
(
    id int not null,
    startdate datetime not null,
    enddate datetime not null
)

go

insert into users(id, startdate, enddate) values(1, '01/01/2000', '01/02/2000')
insert into users(id, startdate, enddate) values(1, '01/03/2000', '01/04/2000')

insert into users(id, startdate, enddate) values(2, '01/01/2000', '01/02/2000')
insert into users(id, startdate, enddate) values(2, '01/03/2000', '01/04/2000')
insert into users(id, startdate, enddate) values(2, '01/05/2000', '01/06/2000')

insert into users(id, startdate, enddate) values(3, '01/01/2000', '01/02/2000')
insert into users(id, startdate, enddate) values(3, '01/03/2000', '01/04/2000')
insert into users(id, startdate, enddate) values(3, '01/06/2000', '01/07/2000')

insert into users(id, startdate, enddate) values(4, '01/01/2000', '01/02/2000')

go

select * from users 

go

// This query gave the result seen in the image

select id, startdate, enddate, rownum = dense_rank() over(partition by id order by enddate) from users

enter image description here

I want to write a query which will return only the IDs 1 and 2 (not 3 and 4) because:

  • ID 1 - has more than 1 rows and startdate of its rownum 2 is 1 day ahead of enddate of its rownum 1
  • ID 2 - has more than 1 rows and startdate of its rownum n + 1 is 1 day ahead of enddate of its rownum n
  • ID 3 - THOUGH has more than 1 rows, startdate of its rownum 3 is NOT 1 day ahead (but 2 days) of enddate of its rownum 2. Hence, it is not qualified
  • ID 4 - DOES NOT HAVE more than 1 rows. Hence, it is not qualified

Could you let me know how to get this result please?


Solution

  • You could use window function lag() to recover the previous enddate, then aggregation and filter in the having clause:

    select id
    from (
        select 
            t.*,
            lag(enddate) over(partition by id order by enddate) lag_enddate
        from users t
    ) t
    group by id 
    having 
        count(*) > 1
        and max(case 
            when lag_enddate is null or startdate = dateadd(day, 1, lag_enddate) 
            then 0 else 1 
        end) = 0
    

    Demo on DB Fiddle:

    | id |
    | -: |
    |  1 |
    |  2 |
    

    In archaic versions of SQL Server, that do not support window functions, you can emulate lag() with a correlated subquery:

    select id
    from (
        select 
            t.*,
            (select max(enddate) from users t1 where t1.id = t.id and t1.enddate < t.enddate) lag_enddate
        from users t
    ) t
    group by id 
    having 
        count(*) > 1
        and max(case when lag_enddate is null or startdate = dateadd(day, 1, lag_enddate) then 0 else 1 end) = 0
    

    Demo on DB Fiddle