Search code examples
sqlgaps-and-islandsanalytic-functions

SQL group data taking date sequence into account


I am trying to group separate rows which might be broken by a different group. If they are broken I want them to group separately.

So I have:

Col1 | Col2
---------------------
| Y  |01/JAN/2012
| Y  |01/FEB/2012
| N  |01/MAR/2012
| Y  |01/APR/2012
| Y  |01/MAY/2012

I want to get the result:

|col1|col2       |GRP
---------------------
| Y  |01/JAN/2012|1
| Y  |01/FEB/2012|1
| N  |01/MAR/2012|2
| Y  |01/APR/2012|3
| Y  |01/MAY/2012|3

How can I achieve this?

My current attempt is this:

select
    Col1,
    Col2,
    dense_rank() over (partition by Col1 order by Col2 asc) as grp  
from
    myTABLE
;

but that groups all the 'Y's together and gives me a sequential number like this:

|col1|col2       |GRP
---------------------
| Y  |01/JAN/2012|1
| Y  |01/FEB/2012|2
| N  |01/MAR/2012|1
| Y  |01/APR/2012|3
| Y  |01/MAY/2012|4

Solution

  • This is a form of gaps-and-islands problem. I would recommend using the difference of row numbers to identify the "island"s and then row_number():

    select t.*, dense_rank() over (order by grp) as grp
    from (select t.*,
                 min(col2) over (partition by col1, seqnum - seqnum_2) as grp
          from (select t.*,
                       row_number() over (order by col2) as seqnum,
                       row_number() over (partition by col1 order by col2) as seqnum_2
                from t
               ) t
         ) t
    order by col2;
    

    Actually, an easier way is to use lag() and a cumulative sum:

    select t.*,
           sum(case when col1 = prev_col1 then 0 else 1 end) over (order by col2) as grp
    from (select t.*, lag(col1) over (partition by col2) as prev_col1
          from t
         ) t