Search code examples
sql-servermergerangeoverlaprectangles

Packing Ranges in Two Dimensions


My Problem

Consider a set of data with two intervals. For instance, consider a student schedule of classes. Each record has a begin and end date, and each class has a period start time and a period end time. But this schedule is not 'normalized' in the sense that some records overlap. So if you search for records encompassing a given date and period for a student, you might get multiple matches.

Here's a contrived example. I represent the dates as integers to simplify the problem:

declare @schedule table (
    student char(3),
    fromDate int,
    toDate int,
    fromPeriod int,
    toPeriod int
)

insert @schedule values
    ('amy', 1, 7, 7, 9),
    ('amy', 3, 9, 5, 8), 
    ('amy', 10, 12, 1, 3), 
    ('ted', 1, 5, 11, 14),
    ('ted', 7, 11, 13, 16); 

Amy's date and period ranges either overlap or are adjacent. If I queried for 'date 5 period 7', I would get two matches. I need these reworked so that they represent the same 'area' but no longer overlap.

Ted's periods overlap but his dates do not. This means there's no real overlap, so no need to re-work anything.

My Research

I've read many posts and some articles on working overlapping intervals. Namely:

I've implemented one from Itzik from a blog entitled 'solutions-packing-date-and-time-intervals-puzzle' that has worked great for one particular project. I don't think it's a stable link, but I've found a copy of it here.

But I'm having difficulty extending the knowledge in those resources to my problem at hand. It might be my limitation. I have trouble following them. I've studied Itzik's solution and have come to understand a lot of it, but I do recall there's one piece I just couldn't understand. Or it might be that those solutions only work with singular ranges.

My Attempt

I resolved this question by treating the ranges as literal rectangle objects. It works. I've even made a version of it somewhat performant in my own application. So I'll post it as a solution in case it is of use to anyone with the same issue.

But it is so long and involved and there are enough quirks to it (e.g. buffering lines, looping shapes, working with float values, rounding issues) that I can't help but think that there's a much better way. Can the concepts of my listed resources be extended to dual ranges? Or do some SRID's allow cutting rectangles with zero-length lines?

Expected Results:

There is no one answer to this problem, because you can aggregate ranges and deconstruct them in different ways. But to minimize the number of resulting rectangles, there are really only two acceptable answers. Visually, with dates on the X axis and periods on the Y axis, overlapping ranges can start out like this:

 +------------+
 |            |
 |    +------------+
 |    ||||||||     |  <- 2 overlapping rectangles
 +----|            |
      |            |
      +------------+

We can rework it this way:

 +---+ +-----+
 |   | |     |
 |   | |     | +---+  <- 3 non-overlapping 
 |   | |     | |   |     vertically cut rectangles
 +---| |     | |   |
       |     | |   |
       +-----+ +---+

Or this way:

 +-----------+
 +-----------+

 +-----------------+  <- 3 non-overlapping 
 +-----------------+     horizontally cut rectangles

       +-----------+
       +-----------+

Going with vertical cuts, the results would look like this:

+-------------------------------------------+
|student|fromDate|toDate|fromPeriod|toPeriod|
|-------------------------------------------|
|amy    |1       |2     |7         |9       |
|amy    |3       |7     |5         |9       |
|amy    |8       |9     |5         |8       |
|amy    |10      |12    |1         |3       |
|ted    |1       |5     |11        |14      |
|ted    |7       |11    |13        |16      |
+-------------------------------------------+

Going with horizontal cuts, the results would look like this:

+-------------------------------------------+
|student|fromDate|toDate|fromPeriod|toPeriod|
|-------------------------------------------|
|amy    |1       |7     |9         |9       |
|amy    |1       |9     |7         |8       |
|amy    |3       |9     |5         |6       |
|amy    |10      |12    |1         |3       |
|ted    |1       |5     |11        |14      |
|ted    |7       |11    |13        |16      |
+-------------------------------------------+

Either is acceptable. Though, to keep it deterministic and tractable, you would want to choose one strategy and stick with it.


Solution

  • I had the pleasure of attending one of Itzik's trainings, and presented this to him. I only asked for possible references but he provided the solution! Presented here with his verbal permission. Some small modifications for explanatory value and to work with the existing numbers table in the question. Any errors as a result are of course my own:

    WITH pixellateAndBinPeriods AS (
        SELECT
            s.student,
            date = dates.i,
            period = periods.i,
    
            -- periods.i increments by 1 if the period before it is consecutive, by more than 1 otherwise
            -- denserank increments by 1 no mater what
            -- subtracting out the denserank produces the same number when consecutive, a different number when not
            periodBin = 
                periods.i -- increments in order if consecutive
                - DENSE_RANK() OVER (PARTITION BY s.student, dates.i ORDER BY periods.i) -- increments in order no matter what
                -- subtracting in these conditions creates an distinct identifier for consecutive series
    
        FROM @schedule AS s
        INNER JOIN #numbers dates 
            ON dates.i >= s.fromdate 
            AND dates.i <= s.todate
        INNER JOIN #numbers periods 
            ON periods.i >= s.fromperiod 
            AND periods.i <= s.toperiod
    ),
    packPeriodsAndBinDates AS (
        SELECT
            student,
            date,
            fromperiod = MIN(period),
            toperiod = MAX(period),
    
            -- Same logic as periodBin, but also partition by from and to period to group together 
            -- consecutive dates only if they have the same period ranges.
            dateBin = date - DENSE_RANK() OVER (PARTITION BY student, MIN(period), MAX(period) ORDER BY date)
    
        FROM pixellateAndBinPeriods
        GROUP BY
            student,
            date,
            periodBin
    )
    -- pack the dates
    SELECT
        student,
        fromdate = MIN(date),
        todate = MAX(date),
        fromperiod,
        toperiod
    INTO #normalized
    FROM packPeriodsAndBinDates
    GROUP BY
        student,
        fromperiod,
        toperiod,
        dateBin
    ORDER BY
        student,
        fromdate,
        fromperiod;
    

    Unnormalized Schedule:

    schedule

    Pixelation:

    #pixellateAndBinPeriods

    Packing by Period:

    packPeriodsAndBinDates

    Packing by Date:

    enter image description here