I'm on Postgres 15 and using the multirange type to aggregate overlapping time ranges into islands:
CREATE TABLE time_entries (
id bigint NOT NULL,
contract_id bigint,
"from" timestamp(6) without time zone,
"to" timestamp(6) without time zone,
type varchar,
range tsrange GENERATED ALWAYS AS (tsrange("from", "to")) STORED
);
INSERT INTO time_entries VALUES (1, 1, '2022-12-07T09:00', '2022-12-07T10:00', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T08:00', '2022-12-07T10:30', 'punch_clock');
INSERT INTO time_entries VALUES (1, 1, '2022-12-07T12:00', '2022-12-07T12:30', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T11:30', '2022-12-07T12:15', 'punch_clock');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T13:00', '2022-12-07T13:30', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T13:15', '2022-12-07T13:45', 'punch_clock');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T14:00', '2022-12-07T15:00', 'punch_clock');
SELECT contract_id, unnest(range_agg(range)) AS range FROM time_entries GROUP BY contract_id;
Working db<>fiddle: https://dbfiddle.uk/V9a7H8nJ
This results in these merged time ranges:
contract_id | range |
---|---|
1 | ["2022-12-07 08:00:00","2022-12-07 10:30:00") |
1 | ["2022-12-07 11:30:00","2022-12-07 12:30:00") |
1 | ["2022-12-07 13:00:00","2022-12-07 13:45:00") |
1 | ["2022-12-07 14:00:00","2022-12-07 15:00:00") |
But now, I need another metric: The unbilled hours of a tracking. I have two types of entries punch_clock
and billed
. The former is a tracker that runs in the background when they are working, and the latter is a parallel tracker that runs for specific projects.
How could I calculate the difference between the overlapping punch_clock
and billed
entries?
The desired result would be:
contract_id | unbilled |
---|---|
1 | ["2022-12-07 08:00:00","2022-12-07 09:00:00") |
1 | ["2022-12-07 10:00:00","2022-12-07 10:30:00") |
1 | ["2022-12-07 11:30:00","2022-12-07 12:00:00") |
1 | ["2022-12-07 13:30:00","2022-12-07 13:45:00") |
1 | ["2022-12-07 14:00:00","2022-12-07 15:00:00") |
I played around by grouping by type
, but it seems there is no range_difference_agg
aggregation function for multiranges, only range_agg
for union and range_intersect_agg
for intersections: https://www.postgresql.org/docs/current/functions-aggregate.html
First I suggest you to change the definition of the time_entries
table so that to replace the type tsrange
of the range
column by tsmultirange
:
CREATE TABLE time_entries (
id bigint NOT NULL,
contract_id bigint,
"from" timestamp(6) without time zone,
"to" timestamp(6) without time zone,
type varchar,
range tsmultirange GENERATED ALWAYS AS (tsmultirange(tsrange("from", "to"))) STORED
);
Then, according to the data sample you provide, you don't need an aggregate function. A self-join query should provide the expected result :
SELECT p.contract_id, unnest(CASE WHEN b.range IS NULL THEN p.range ELSE p.range - b.range END) AS unbilled
FROM ( SELECT contract_id, range FROM time_entries WHERE type = 'punch_clock') AS p
LEFT JOIN (SELECT contract_id, range FROM time_entries WHERE type = 'billed') AS b
ON p.contract_id = b.contract_id
AND p.range && b.range
Result :
contract_id | unbilled |
---|---|
1 | ["2022-12-07 08:00:00","2022-12-07 09:00:00") |
1 | ["2022-12-07 10:00:00","2022-12-07 10:30:00") |
1 | ["2022-12-07 11:30:00","2022-12-07 12:00:00") |
1 | ["2022-12-07 13:30:00","2022-12-07 13:45:00") |
1 | ["2022-12-07 14:00:00","2022-12-07 15:00:00") |
This query will work while only one 'billed' timerange intersects with one 'punch_clock' time range. If several 'billed' timeranges may intersect the same 'punch_clock' time range, then you will need to create your own aggregate function based on the multirange difference operator :
CREATE OR REPLACE FUNCTION multirange_diff (x anymultirange, y anymultirange, z anymultirange)
RETURNS anymultirange LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN x IS NULL THEN COALESCE(y-z, y) ELSE COALESCE(x-z, x) END ; $$ ;
CREATE OR REPLACE AGGREGATE multirange_diff_agg(anymultirange, anymultirange)
( stype = anymultirange, sfunc = multirange_diff) ;
The query using the new aggregate function is :
SELECT p.contract_id, unnest(multirange_diff_agg(p.range, b.range)) AS unbilled
FROM ( SELECT contract_id, range FROM time_entries WHERE type = 'punch_clock') AS p
LEFT JOIN (SELECT contract_id, range FROM time_entries WHERE type = 'billed') AS b
ON p.contract_id = b.contract_id
AND p.range && b.range
GROUP BY p.contract_id, p.range
As an example, after having inserted the new row in table time_entries
:
INSERT INTO time_entries VALUES (1, 1, '2022-12-07T10:10', '2022-12-07T10:20', 'billed');
The result of the above query is :
contract_id | unbilled |
---|---|
1 | ["2022-12-07 08:00:00","2022-12-07 09:00:00") |
1 | ["2022-12-07 10:00:00","2022-12-07 10:10:00") |
1 | ["2022-12-07 10:20:00","2022-12-07 10:30:00") |
1 | ["2022-12-07 11:30:00","2022-12-07 12:00:00") |
1 | ["2022-12-07 13:30:00","2022-12-07 13:45:00") |
1 | ["2022-12-07 14:00:00","2022-12-07 15:00:00") |
UPDATE
Even when several 'billed' timeranges intersect the same 'punch_clock' time range, you don't need an aggregate function. The following query should provide the expected result :
SELECT contract_id
, unnest ( range_agg(range) FILTER (WHERE type = 'punch_clock')
- range_agg(range) FILTER (WHERE type = 'billed')
) AS unbilled
FROM time_entries
GROUP BY contract_id
see the test result in dbfiddle
for more information about creating an aggregate function see the manual