I am trying to solve a problem of "labor tickets" - where each ticket can have a clock in and clock out time.
But, if multiple tickets overlap, the time should be distributed proportionally.
Here's an example:
Single Ticket (no overlap)
StartTime | EndTime | Hours |
---|---|---|
6:00 AM | 7:00 AM | 1.0 |
Two tickets (Full overlap)
StartTime | EndTime | Hours |
---|---|---|
6:00 AM | 7:00 AM | 0.5 |
6:00 AM | 7:00 AM | 0.5 |
Three Tickets (Full overlap)
StartTime | EndTime | Hours |
---|---|---|
6:00 AM | 7:00 AM | 0.33 |
6:00 AM | 7:00 AM | 0.33 |
6:00 AM | 7:00 AM | 0.33 |
Two Tickets (Partial Overlap)
StartTime | EndTime | Hours |
---|---|---|
6:00 AM | 7:00 AM | 0.875 |
6:30 AM | 6:45 AM | 0.125 |
The overlapping time is divided amongst each ticket
Three Tickets (Complex Overlap)
StartTime | EndTime | Hours |
---|---|---|
6:00 AM | 7:00 AM | 0.33 |
5:30 AM | 7:30 AM | 1.08 |
5:00 AM | 7:00 AM | 1.08 |
Or more visually:
Here's a "manual" calculation of the three rows:
1 hour / 3 = 0.33
0.5 hours / 2 = 0.25
1/3 = 0.33
0.5/1 = 0.5
0.5 hours / 1 = 0.5
0.5 hours / 2 = 0.25
1 hour / 3 = 0.33
To make this simpler, I'm really only trying to make a stored procedure to calculate a single row at a time. So I don't have to worry about the overall recursion of this process.
I'm hoping there is some sort of mathematical formula that would work - I tried taking (current ticket time) divided by (total overlapping time) as a fraction and using that, but that didn't give me a good result on the partial overlaps.
Worst case, I might be able to do this with loops - chunk up our times into "blocks", calculate the time divisor in each blocks, and then add it up again. But, if I can find some sort of mathematical solution, that would be a lot cleaner.
I'm running in SQL Server 2019.
CRME
Here is some sample T-SQL to generate the complex example (SQL Fiddle):
DECLARE @LaborTickets TABLE (
TicketID INT IDENTITY(1,1),
StartTime DATETIME,
EndTime DATETIME
);
INSERT INTO @LaborTickets (StartTime, EndTime)
VALUES
('2025-02-04 06:00:00', '2025-02-04 07:00:00'),
('2025-02-04 05:30:00', '2025-02-04 07:30:00'),
('2025-02-04 05:00:00', '2025-02-04 07:00:00');
and the goal is to return results of the form:
StartTime EndTime Hours
------------------ ------------------ -------------------
2025-02-04 6:00 AM 2025-02-04 7:00 AM 0.33333333333333333
2025-02-04 5:30 AM 2025-02-04 7:30 AM 1.08333333333333333
2025-02-04 5:00 AM 2025-02-04 7:00 AM 1.08333333333333333
3 row(s) affected
Edit: In these examples, I am using a maximum of 3 tickets, with nice round number increments.
The real data is a lot more detailed, down to seconds, and there could be a large number of overlapping ones - perhaps up to 20.
Here's another, more complex example:
DECLARE @LaborTickets TABLE (
TicketID INT IDENTITY(1,1),
StartTime DATETIME,
EndTime DATETIME
);
INSERT INTO @LaborTickets (StartTime, EndTime)
VALUES
('2025-02-04 06:01:13', '2025-02-04 07:00:00'),
('2025-02-04 06:01:21', '2025-02-04 07:01:00'),
('2025-02-04 06:01:25', '2025-02-04 07:01:30'),
('2025-02-04 06:01:29', '2025-02-04 07:01:50'),
('2025-02-04 06:01:35', '2025-02-04 07:01:00'),
('2025-02-04 06:01:50', '2025-02-04 07:02:00');
This is not the correct answer, but I have to leave for a family event and didn't want to leave you hanging without something.
-- INIT database
CREATE TABLE tickets (
ticket_id INT,
start_time DATETIME,
end_time DATETIME
);
INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (1, '20240204 06:00:00 AM', '20240204 07:00:00 AM');
INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (1, '20240204 05:30:00 AM', '20240204 07:30:00 AM');
INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (1, '20240204 05:00:00 AM', '20240204 07:00:00 AM');
INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (2, '20240204 06:00:00 AM', '20240204 07:00:00 AM');
INSERT INTO tickets(ticket_id, start_time, end_time) VALUES (2, '20240204 06:30:00 AM', '20240204 06:45:00 AM');
-- First CTE: Generate a time series for each ticket_id, this was annoying there must be another way?
WITH TimeSeries AS (
SELECT
ticket_id,
MIN(start_time) AS min_time,
MAX(end_time) AS max_time
FROM tickets
GROUP BY ticket_id
),
TimeIntervals AS (
SELECT
ticket_id,
min_time AS start_interval,
DATEADD(MINUTE, 30, min_time) AS end_interval,
max_time
FROM TimeSeries
UNION ALL
SELECT
ticket_id,
end_interval,
DATEADD(MINUTE, 30, end_interval),
max_time
FROM TimeIntervals
WHERE DATEADD(MINUTE, 30, end_interval) <= max_time
),
OpenTickets AS (
SELECT
ti.ticket_id,
ti.start_interval,
ti.end_interval,
COUNT(*) AS open_tickets_count,
0.50 / COUNT(*) AS weight_per_ticket
FROM TimeIntervals ti
INNER JOIN tickets t
ON t.ticket_id = ti.ticket_id
AND t.start_time < ti.end_interval
AND t.end_time >= ti.start_interval
GROUP BY ti.ticket_id, ti.start_interval, ti.end_interval
)
SELECT
t.ticket_id,
t.start_time,
t.end_time,
SUM(ot.weight_per_ticket) AS total_weight
FROM tickets t
INNER JOIN OpenTickets ot
ON t.ticket_id = ot.ticket_id
AND t.start_time < ot.end_interval
AND t.end_time > ot.start_interval
GROUP BY t.ticket_id, t.start_time, t.end_time
ORDER BY t.ticket_id, t.start_time;
I know the concept of ticket_id is wrong but that's what I was doing to differentiate between your examples.