Search code examples
sqlsql-server-2008-r2ranking

Select top ranked overlapping segment SQL


I am looking for a way to select the top ranked overlapping segments. Table would look something like this:

CODE   START                STOP                 RANK
shift  2016-07-20 05:00 AM  2016-07-20 08:00 AM  5 
late   2016-07-20 05:00 AM  2016-07-20 05:08 AM  1
break  2016-07-20 06:00 AM  2016-07-20 06:30 AM  2

This is what I would like my output to be:

CODE   START                STOP                 
late   2016-07-20 05:00 AM  2016-07-20 05:08 AM   
shift  2016-07-20 05:08 AM  2016-07-20 06:00 AM  
break  2016-07-20 06:00 AM  2016-07-20 06:30 AM
shift  2016-07-20 06:30 AM  2016-07-20 08:00 AM  

So pretty much I would only like to see what the top ranked segment is saying about this person's state, but if they didn't have any state other than the standard "shift" segment then just show that they are on their shift.

Does it makes sense? Please shoot away any questions or proposed solutions. I can't seem to think of anything at this moment. I can select the top ranked segments, but not when they are overlapping.

EDIT: As you can see on my desired output the shift segment gets overridden by the late segment which has a higher rank (lower number means higher rank, as usual in ranking) from 05:00 AM to 05:08 AM, but from 05:08 AM since there is no segment overridding it, we go back to our default segment shift from 05:08 AM to 06:00 AM.

Then there is a scheduled break segment from 06:00 AM and 06:30 AM which again overrides the shift segment. After this is finished we go back to our default segment shift from 06:30 AM to 08:00 AM when the shift ends.

I hope this makes sense.


Solution

  • Yay, a SQL puzzle, I can't resist! :D

    Here's one possible solution. I don't have SQLServer at hand (using my favorite database :)), but SQL should be mostly portable:

    create or replace table ranges(
            code varchar,
            beg timestamp_ntz,
            end timestamp_ntz,
            rank integer);
    insert into ranges values
            ('shift', '2016-07-20 05:00:00', '2016-07-20 08:00:00', 5),
            ('late',  '2016-07-20 05:00:00', '2016-07-20 05:00:08', 1),
            ('break', '2016-07-20 06:00:00', '2016-07-20 06:30:00', 2);
    
    WITH PERIODS AS (
      select beg, lead(beg, 1) over (order by beg) AS end
      from (select beg from ranges union select end from ranges)
    ),
    MATCHING_RANGES AS (
      select periods.beg, periods.end, ranges.code, ranges.rank
      from periods
      join ranges on (periods.beg >= ranges.beg and periods.end <= ranges.end)
      where periods.end is not null 
    ),
    RANKED_RANGES AS ( 
      select beg, end, code, row_number() over (partition by beg order by rank) in_period_rank 
      from matching_ranges 
    )
    select code, beg, end from ranked_ranges
    where in_period_rank = 1
    order by beg;
    
    -------+---------------------+---------------------+
     CODE  |         BEG         |         END         |
    -------+---------------------+---------------------+
     late  | 2016-07-20 05:00:00 | 2016-07-20 05:00:08 |
     shift | 2016-07-20 05:00:08 | 2016-07-20 06:00:00 |
     break | 2016-07-20 06:00:00 | 2016-07-20 06:30:00 |
     shift | 2016-07-20 06:30:00 | 2016-07-20 08:00:00 |
    -------+---------------------+---------------------+
    

    Explanation (I use "ranges" for your original table, and "periods" for slices of these, like what you want in the output):

    • In PERIODS we create all distinct moments in time, and use LAG to find the next one. Output is:

      ---------------------+---------------------+
               BEG         |         END         |
      ---------------------+---------------------+
       2016-07-20 05:00:00 | 2016-07-20 05:00:08 |
       2016-07-20 05:00:08 | 2016-07-20 06:00:00 |
       2016-07-20 06:00:00 | 2016-07-20 06:30:00 |
       2016-07-20 06:30:00 | 2016-07-20 08:00:00 |
       2016-07-20 08:00:00 | [NULL]              |
      ---------------------+---------------------+
      
    • Then in MATCHING_RANGES, for every "period" we find all possible ranges from the defined table (also removing the last row, NULL), output:

      ---------------------+---------------------+-------+------+
               BEG         |         END         | CODE  | RANK |
      ---------------------+---------------------+-------+------+
       2016-07-20 05:00:00 | 2016-07-20 05:00:08 | shift | 5    |
       2016-07-20 05:00:00 | 2016-07-20 05:00:08 | late  | 1    |
       2016-07-20 05:00:08 | 2016-07-20 06:00:00 | shift | 5    |
       2016-07-20 06:00:00 | 2016-07-20 06:30:00 | shift | 5    |
       2016-07-20 06:00:00 | 2016-07-20 06:30:00 | break | 2    |
       2016-07-20 06:30:00 | 2016-07-20 08:00:00 | shift | 5    |
      ---------------------+---------------------+-------+------+
      

      Note how that creates all combinations of ranges and periods that match

    • Then in RANKED_RANGES for each row we compute its priority within its period:

      ---------------------+---------------------+-------+----------------+
               BEG         |         END         | CODE  | IN_PERIOD_RANK |
      ---------------------+---------------------+-------+----------------+
       2016-07-20 05:00:00 | 2016-07-20 05:00:08 | late  | 1              |
       2016-07-20 05:00:00 | 2016-07-20 05:00:08 | shift | 2              |
       2016-07-20 05:00:08 | 2016-07-20 06:00:00 | shift | 1              |
       2016-07-20 06:00:00 | 2016-07-20 06:30:00 | break | 1              |
       2016-07-20 06:00:00 | 2016-07-20 06:30:00 | shift | 2              |
       2016-07-20 06:30:00 | 2016-07-20 08:00:00 | shift | 1              |
      ---------------------+---------------------+-------+----------------+
      
    • And then we simply select rows with rank 1 :)