Search code examples
sqlsql-servermathtimetime-and-attendance

A mathematical way to divide overlapping time ranges into hours


I am trying to solve a problem of "labor tickets" - where each ticket can have a clock in and clock out time.

But, if multiple tickets overlap, the time should be distributed proportionally.

Here's an example:

Single Ticket (no overlap)

StartTime EndTime Hours
6:00 AM 7:00 AM 1.0

Two tickets (Full overlap)

StartTime EndTime Hours
6:00 AM 7:00 AM 0.5
6:00 AM 7:00 AM 0.5

Three Tickets (Full overlap)

StartTime EndTime Hours
6:00 AM 7:00 AM 0.33
6:00 AM 7:00 AM 0.33
6:00 AM 7:00 AM 0.33

Two Tickets (Partial Overlap)

StartTime EndTime Hours
6:00 AM 7:00 AM 0.875
6:30 AM 6:45 AM 0.125

The overlapping time is divided amongst each ticket

Three Tickets (Complex Overlap)

StartTime EndTime Hours
6:00 AM 7:00 AM 0.33
5:30 AM 7:30 AM 1.08
5:00 AM 7:00 AM 1.08

Or more visually:

enter image description here

Here's a "manual" calculation of the three rows:

  1. 6:00 AM-7:00 AM: There are 3 jobs, so that's 1 hour / 3 = 0.33
  2. 5:30 AM-6:00 AM: There are 2 jobs, so that's 0.5 hours / 2 = 0.25
  3. 6:00 AM-7:00 AM: There are 3 jobs so that's 1/3 = 0.33
  4. 7:00 AM-7:30 AM: Only 1 job so that's 0.5/1 = 0.5
  5. 5:00 AM-7:00 AM:
    • 5:00 AM-5:30 AM: Only 1 job, so 0.5 hours / 1 = 0.5
    • 5:30 AM-6:00 AM: 2 jobs, so 0.5 hours / 2 = 0.25
    • 6:00 AM-7:00 AM: 3 jobs, so 1 hour / 3 = 0.33

To make this simpler, I'm really only trying to make a stored procedure to calculate a single row at a time. So I don't have to worry about the overall recursion of this process.

I'm hoping there is some sort of mathematical formula that would work - I tried taking (current ticket time) divided by (total overlapping time) as a fraction and using that, but that didn't give me a good result on the partial overlaps.

Worst case, I might be able to do this with loops - chunk up our times into "blocks", calculate the time divisor in each blocks, and then add it up again. But, if I can find some sort of mathematical solution, that would be a lot cleaner.

I'm running in SQL Server 2019.

CRME

Here is some sample T-SQL to generate the complex example (SQL Fiddle):

DECLARE @LaborTickets TABLE (
    TicketID INT IDENTITY(1,1),
    StartTime DATETIME,
    EndTime DATETIME
);

INSERT INTO @LaborTickets (StartTime, EndTime)
VALUES
    ('2025-02-04 06:00:00', '2025-02-04 07:00:00'),
    ('2025-02-04 05:30:00', '2025-02-04 07:30:00'),
    ('2025-02-04 05:00:00', '2025-02-04 07:00:00');

and the goal is to return results of the form:

StartTime           EndTime             Hours
------------------  ------------------  -------------------
2025-02-04 6:00 AM  2025-02-04 7:00 AM  0.33333333333333333
2025-02-04 5:30 AM  2025-02-04 7:30 AM  1.08333333333333333
2025-02-04 5:00 AM  2025-02-04 7:00 AM  1.08333333333333333

3 row(s) affected

Edit: In these examples, I am using a maximum of 3 tickets, with nice round number increments.

The real data is a lot more detailed, down to seconds, and there could be a large number of overlapping ones - perhaps up to 20.

Here's another, more complex example:

DECLARE @LaborTickets TABLE (
    TicketID INT IDENTITY(1,1),
    StartTime DATETIME,
    EndTime DATETIME
);

INSERT INTO @LaborTickets (StartTime, EndTime)
VALUES
    ('2025-02-04 06:01:13', '2025-02-04 07:00:00'),
    ('2025-02-04 06:01:21', '2025-02-04 07:01:00'),
    ('2025-02-04 06:01:25', '2025-02-04 07:01:30'),
    ('2025-02-04 06:01:29', '2025-02-04 07:01:50'),
    ('2025-02-04 06:01:35', '2025-02-04 07:01:00'),
    ('2025-02-04 06:01:50', '2025-02-04 07:02:00');

Solution

  • This is not the correct answer, but I have to leave for a family event and didn't want to leave you hanging without something.

    -- INIT database
    CREATE TABLE tickets (
      ticket_id INT,
      start_time DATETIME,
      end_time DATETIME
    );
    
    INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (1, '20240204 06:00:00 AM', '20240204 07:00:00 AM');
    INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (1, '20240204 05:30:00 AM', '20240204 07:30:00 AM');
    INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (1, '20240204 05:00:00 AM', '20240204 07:00:00 AM');
    INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (2, '20240204 06:00:00 AM', '20240204 07:00:00 AM');
    INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (2, '20240204 06:30:00 AM', '20240204 06:45:00 AM');
    
    -- First CTE: Generate a time series for each ticket_id, this was annoying there must be another way?
    WITH TimeSeries AS (
        SELECT 
            ticket_id,
            MIN(start_time) AS min_time,
            MAX(end_time) AS max_time
        FROM tickets
        GROUP BY ticket_id
    ),
    TimeIntervals AS (
        SELECT 
            ticket_id, 
            min_time AS start_interval, 
            DATEADD(MINUTE, 30, min_time) AS end_interval,
            max_time
        FROM TimeSeries
        UNION ALL
        SELECT 
            ticket_id, 
            end_interval, 
            DATEADD(MINUTE, 30, end_interval),
            max_time
        FROM TimeIntervals
        WHERE DATEADD(MINUTE, 30, end_interval) <= max_time
    ),
    
    OpenTickets AS (
        SELECT 
            ti.ticket_id,
            ti.start_interval,
            ti.end_interval,
            COUNT(*) AS open_tickets_count,
            0.50 / COUNT(*) AS weight_per_ticket  
        FROM TimeIntervals ti
        INNER JOIN tickets t
            ON t.ticket_id = ti.ticket_id
            AND t.start_time < ti.end_interval
            AND t.end_time >= ti.start_interval
        GROUP BY ti.ticket_id, ti.start_interval, ti.end_interval
    )
    SELECT 
        t.ticket_id, 
        t.start_time, 
        t.end_time, 
        SUM(ot.weight_per_ticket) AS total_weight
    FROM tickets t
    INNER JOIN OpenTickets ot
        ON t.ticket_id = ot.ticket_id
        AND t.start_time < ot.end_interval 
        AND t.end_time > ot.start_interval 
    GROUP BY t.ticket_id, t.start_time, t.end_time
    ORDER BY t.ticket_id, t.start_time;
    

    I know the concept of ticket_id is wrong but that's what I was doing to differentiate between your examples.