postgresql aggregate aggregate-functions difference

Aggregate difference between time ranges in Postgres

I'm on Postgres 15 and using the multirange type to aggregate overlapping time ranges into islands:

CREATE TABLE time_entries (
    id bigint NOT NULL,
    contract_id bigint,
    "from" timestamp(6) without time zone,
    "to" timestamp(6) without time zone,
    type varchar,
    range tsrange GENERATED ALWAYS AS (tsrange("from", "to")) STORED
);

INSERT INTO time_entries VALUES (1, 1, '2022-12-07T09:00', '2022-12-07T10:00', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T08:00', '2022-12-07T10:30', 'punch_clock');

INSERT INTO time_entries VALUES (1, 1, '2022-12-07T12:00', '2022-12-07T12:30', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T11:30', '2022-12-07T12:15', 'punch_clock');

INSERT INTO time_entries VALUES (2, 1, '2022-12-07T13:00', '2022-12-07T13:30', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T13:15', '2022-12-07T13:45', 'punch_clock');

INSERT INTO time_entries VALUES (2, 1, '2022-12-07T14:00', '2022-12-07T15:00', 'punch_clock');

SELECT contract_id, unnest(range_agg(range)) AS range FROM time_entries GROUP BY contract_id;

Working db<>fiddle: https://dbfiddle.uk/V9a7H8nJ

This results in these merged time ranges:

contract_id	range
1	["2022-12-07 08:00:00","2022-12-07 10:30:00")
1	["2022-12-07 11:30:00","2022-12-07 12:30:00")
1	["2022-12-07 13:00:00","2022-12-07 13:45:00")
1	["2022-12-07 14:00:00","2022-12-07 15:00:00")

But now, I need another metric: The unbilled hours of a tracking. I have two types of entries punch_clock and billed. The former is a tracker that runs in the background when they are working, and the latter is a parallel tracker that runs for specific projects.

How could I calculate the difference between the overlapping punch_clock and billed entries?

The desired result would be:

contract_id	unbilled
1	["2022-12-07 08:00:00","2022-12-07 09:00:00")
1	["2022-12-07 10:00:00","2022-12-07 10:30:00")
1	["2022-12-07 11:30:00","2022-12-07 12:00:00")
1	["2022-12-07 13:30:00","2022-12-07 13:45:00")
1	["2022-12-07 14:00:00","2022-12-07 15:00:00")

I played around by grouping by type, but it seems there is no range_difference_agg aggregation function for multiranges, only range_agg for union and range_intersect_agg for intersections: https://www.postgresql.org/docs/current/functions-aggregate.html

Solution

First I suggest you to change the definition of the time_entries table so that to replace the type tsrange of the range column by tsmultirange :

CREATE TABLE time_entries (
    id bigint NOT NULL,
    contract_id bigint,
    "from" timestamp(6) without time zone,
    "to" timestamp(6) without time zone,
    type varchar,
    range tsmultirange GENERATED ALWAYS AS (tsmultirange(tsrange("from", "to"))) STORED
);

Then, according to the data sample you provide, you don't need an aggregate function. A self-join query should provide the expected result :

SELECT p.contract_id, unnest(CASE WHEN b.range IS NULL THEN p.range ELSE p.range - b.range END) AS unbilled
  FROM ( SELECT contract_id, range FROM time_entries WHERE type = 'punch_clock') AS p
  LEFT JOIN (SELECT contract_id, range FROM time_entries WHERE type = 'billed') AS b
    ON p.contract_id = b.contract_id
   AND p.range && b.range

Result :

contract_id	unbilled
1	["2022-12-07 08:00:00","2022-12-07 09:00:00")
1	["2022-12-07 10:00:00","2022-12-07 10:30:00")
1	["2022-12-07 11:30:00","2022-12-07 12:00:00")
1	["2022-12-07 13:30:00","2022-12-07 13:45:00")
1	["2022-12-07 14:00:00","2022-12-07 15:00:00")

This query will work while only one 'billed' timerange intersects with one 'punch_clock' time range. If several 'billed' timeranges may intersect the same 'punch_clock' time range, then you will need to create your own aggregate function based on the multirange difference operator :

CREATE OR REPLACE FUNCTION multirange_diff (x anymultirange, y anymultirange, z anymultirange)
RETURNS anymultirange LANGUAGE sql IMMUTABLE AS $$
SELECT CASE WHEN x IS NULL THEN COALESCE(y-z, y) ELSE COALESCE(x-z, x) END ; $$ ;

CREATE OR REPLACE AGGREGATE multirange_diff_agg(anymultirange, anymultirange)
( stype = anymultirange, sfunc = multirange_diff) ;

The query using the new aggregate function is :

SELECT p.contract_id, unnest(multirange_diff_agg(p.range, b.range)) AS unbilled
  FROM ( SELECT contract_id, range FROM time_entries WHERE type = 'punch_clock') AS p
  LEFT JOIN (SELECT contract_id, range FROM time_entries WHERE type = 'billed') AS b
    ON p.contract_id = b.contract_id
   AND p.range && b.range
 GROUP BY p.contract_id, p.range

As an example, after having inserted the new row in table time_entries :

INSERT INTO time_entries VALUES (1, 1, '2022-12-07T10:10', '2022-12-07T10:20', 'billed');

The result of the above query is :

contract_id	unbilled
1	["2022-12-07 08:00:00","2022-12-07 09:00:00")
1	["2022-12-07 10:00:00","2022-12-07 10:10:00")
1	["2022-12-07 10:20:00","2022-12-07 10:30:00")
1	["2022-12-07 11:30:00","2022-12-07 12:00:00")
1	["2022-12-07 13:30:00","2022-12-07 13:45:00")
1	["2022-12-07 14:00:00","2022-12-07 15:00:00")

UPDATE

Even when several 'billed' timeranges intersect the same 'punch_clock' time range, you don't need an aggregate function. The following query should provide the expected result :

SELECT contract_id
     , unnest ( range_agg(range) FILTER (WHERE type = 'punch_clock')
              - range_agg(range) FILTER (WHERE type = 'billed')
              ) AS unbilled
  FROM time_entries
 GROUP BY contract_id

see the test result in dbfiddle

for more information about creating an aggregate function see the manual