Search code examples
sqlsql-serversql-rank

Calculate Rank Based on Shared Column Values and Consecutive Date Ranges (same rank for records with consecutive range)


I am trying to get the rank of a table that has specific id's and a start and end date for each record, as such:

id1 id2 flag startdate enddate
1 1 y 2007-01-10 2007-02-12
1 1 y 2007-02-13 2007-08-04
1 1 y 2007-08-05 2008-10-04
1 1 n 2008-10-05 2008-11-14
1 1 n 2008-11-15 2008-12-02
1 1 n 2008-12-08 2008-12-20
2 2 y 2012-01-10 2012-02-12
2 2 y 2012-02-13 2012-08-04
2 3 y 2012-01-10 2012-02-14
2 4 y 2012-08-14 2013-01-10
2 4 y 2013-01-15 2013-01-26
2 4 y 2013-01-27 2013-02-04
2 4 n 2016-03-14 2016-04-12

Where I essentially want to give the same count value to all records which share the same id1, id2, and flag, and are consecutive in their dates. Consecutive, meaning the start date of one record is equal to the end date of the previous record + 1 day. The desired output should look like:

id1 id2 flag startdate enddate rank_t
1 1 y 2007-01-10 2007-02-12 1
1 1 y 2007-02-13 2007-08-04 1
1 1 y 2007-08-05 2008-10-04 1
1 1 n 2008-10-05 2008-11-14 2
1 1 n 2008-11-15 2008-12-02 2
1 1 n 2008-12-08 2008-12-20 3
2 2 y 2012-01-10 2012-02-12 4
2 2 y 2012-02-13 2012-08-04 4
2 3 y 2012-01-10 2012-02-14 5
2 4 y 2012-08-14 2013-01-10 6
2 4 y 2013-01-15 2013-01-26 7
2 4 y 2013-01-27 2013-02-04 7
2 4 n 2016-03-14 2016-04-12 8

The output or rank does not have to be in that exact order, but the idea is still the same. Records which share the same id1, id2, and flag, and are consecutive in their dates should all have the same rank. And that rank value should not be used again for any other 'group' of records.

Here is the code to generate a temp table with this structure:

if object_id('tempdb..#temp1') is not null drop table #temp1
CREATE TABLE #temp1 (id1 INT, id2 int, flag varchar(10), startdate DATETIME, enddate DATETIME)
INSERT INTO #temp1 values
(1, 1, 'y', '2007-01-10', '2007-02-12'),
(1, 1, 'y', '2007-02-13', '2007-08-04'),
(1, 1,'y', '2007-08-05', '2008-10-04'),
(1, 1,'n', '2008-10-05', '2008-11-14'),
(1, 1,'n', '2008-11-15', '2008-12-02'),
(1, 1,'n', '2008-12-08', '2008-12-20'),
(2, 2,'y', '2012-01-10', '2012-02-12'),
(2, 2,'y', '2012-02-13', '2012-08-04'),
(2, 3,'y', '2012-01-10', '2012-02-14'),
(2, 4,'y', '2012-08-14', '2013-01-10'),
(2, 4,'y', '2013-01-15', '2013-01-26'),
(2, 4,'y', '2013-01-27', '2013-02-04'),
(2, 4,'n', '2016-03-14', '2016-04-12')

Thanks in advance for any help.


Solution

  • Same logic as existing answer... just done as 2 CTEs (which I find clearer) than a combination of CTE+Sub-query.

    with cte1 as (
        select *
            --  Identify if there is a gap between the current startdate and the previous enddate
            , case when lag(enddate,1,dateadd(day,-1,startdate)) over (partition by id1, id2, flag order by startdate asc) = dateadd(day,-1,startdate) then 0 else 1 end DateGap
        from #temp1
    ), cte2 as (
        select *
            -- Sum every time a gap is detected to generate a new partition
            , sum(DateGap) over (order by startdate asc) DateGapSum
        from cte1
    )
    select id1, id2, flag, startdate, enddate
        -- Use dense_rank to generate the ranking where ties are allocated the same value
        , dense_rank() over (order by id1 asc, id2 asc, flag desc, DateGapSum asc) rank_t
    from cte2
    order by id1, id2, startdate;