Search code examples
postgresqlaggregateaggregate-functionsdifference

Aggregate difference between time ranges in Postgres


I'm on Postgres 15 and using the multirange type to aggregate overlapping time ranges into islands:

CREATE TABLE time_entries (
    id bigint NOT NULL,
    contract_id bigint,
    "from" timestamp(6) without time zone,
    "to" timestamp(6) without time zone,
    type varchar,
    range tsrange GENERATED ALWAYS AS (tsrange("from", "to")) STORED
);

INSERT INTO time_entries VALUES (1, 1, '2022-12-07T09:00', '2022-12-07T10:00', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T08:00', '2022-12-07T10:30', 'punch_clock');

INSERT INTO time_entries VALUES (1, 1, '2022-12-07T12:00', '2022-12-07T12:30', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T11:30', '2022-12-07T12:15', 'punch_clock');

INSERT INTO time_entries VALUES (2, 1, '2022-12-07T13:00', '2022-12-07T13:30', 'billed');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T13:15', '2022-12-07T13:45', 'punch_clock');

INSERT INTO time_entries VALUES (2, 1, '2022-12-07T14:00', '2022-12-07T15:00', 'punch_clock');

SELECT contract_id, unnest(range_agg(range)) AS range FROM time_entries GROUP BY contract_id;

Working db<>fiddle: https://dbfiddle.uk/V9a7H8nJ

This results in these merged time ranges:

contract_id range
1 ["2022-12-07 08:00:00","2022-12-07 10:30:00")
1 ["2022-12-07 11:30:00","2022-12-07 12:30:00")
1 ["2022-12-07 13:00:00","2022-12-07 13:45:00")
1 ["2022-12-07 14:00:00","2022-12-07 15:00:00")

But now, I need another metric: The unbilled hours of a tracking. I have two types of entries punch_clock and billed. The former is a tracker that runs in the background when they are working, and the latter is a parallel tracker that runs for specific projects.

How could I calculate the difference between the overlapping punch_clock and billed entries?

The desired result would be:

contract_id unbilled
1 ["2022-12-07 08:00:00","2022-12-07 09:00:00")
1 ["2022-12-07 10:00:00","2022-12-07 10:30:00")
1 ["2022-12-07 11:30:00","2022-12-07 12:00:00")
1 ["2022-12-07 13:30:00","2022-12-07 13:45:00")
1 ["2022-12-07 14:00:00","2022-12-07 15:00:00")

I played around by grouping by type, but it seems there is no range_difference_agg aggregation function for multiranges, only range_agg for union and range_intersect_agg for intersections: https://www.postgresql.org/docs/current/functions-aggregate.html


Solution

  • First I suggest you to change the definition of the time_entries table so that to replace the type tsrange of the range column by tsmultirange :

    CREATE TABLE time_entries (
        id bigint NOT NULL,
        contract_id bigint,
        "from" timestamp(6) without time zone,
        "to" timestamp(6) without time zone,
        type varchar,
        range tsmultirange GENERATED ALWAYS AS (tsmultirange(tsrange("from", "to"))) STORED
    );
    

    Then, according to the data sample you provide, you don't need an aggregate function. A self-join query should provide the expected result :

    SELECT p.contract_id, unnest(CASE WHEN b.range IS NULL THEN p.range ELSE p.range - b.range END) AS unbilled
      FROM ( SELECT contract_id, range FROM time_entries WHERE type = 'punch_clock') AS p
      LEFT JOIN (SELECT contract_id, range FROM time_entries WHERE type = 'billed') AS b
        ON p.contract_id = b.contract_id
       AND p.range && b.range
    

    Result :

    contract_id unbilled
    1 ["2022-12-07 08:00:00","2022-12-07 09:00:00")
    1 ["2022-12-07 10:00:00","2022-12-07 10:30:00")
    1 ["2022-12-07 11:30:00","2022-12-07 12:00:00")
    1 ["2022-12-07 13:30:00","2022-12-07 13:45:00")
    1 ["2022-12-07 14:00:00","2022-12-07 15:00:00")

    This query will work while only one 'billed' timerange intersects with one 'punch_clock' time range. If several 'billed' timeranges may intersect the same 'punch_clock' time range, then you will need to create your own aggregate function based on the multirange difference operator :

    CREATE OR REPLACE FUNCTION multirange_diff (x anymultirange, y anymultirange, z anymultirange)
    RETURNS anymultirange LANGUAGE sql IMMUTABLE AS $$
    SELECT CASE WHEN x IS NULL THEN COALESCE(y-z, y) ELSE COALESCE(x-z, x) END ; $$ ;
    
    CREATE OR REPLACE AGGREGATE multirange_diff_agg(anymultirange, anymultirange)
    ( stype = anymultirange, sfunc = multirange_diff) ;
    

    The query using the new aggregate function is :

    SELECT p.contract_id, unnest(multirange_diff_agg(p.range, b.range)) AS unbilled
      FROM ( SELECT contract_id, range FROM time_entries WHERE type = 'punch_clock') AS p
      LEFT JOIN (SELECT contract_id, range FROM time_entries WHERE type = 'billed') AS b
        ON p.contract_id = b.contract_id
       AND p.range && b.range
     GROUP BY p.contract_id, p.range
    

    As an example, after having inserted the new row in table time_entries :

    INSERT INTO time_entries VALUES (1, 1, '2022-12-07T10:10', '2022-12-07T10:20', 'billed');
    

    The result of the above query is :

    contract_id unbilled
    1 ["2022-12-07 08:00:00","2022-12-07 09:00:00")
    1 ["2022-12-07 10:00:00","2022-12-07 10:10:00")
    1 ["2022-12-07 10:20:00","2022-12-07 10:30:00")
    1 ["2022-12-07 11:30:00","2022-12-07 12:00:00")
    1 ["2022-12-07 13:30:00","2022-12-07 13:45:00")
    1 ["2022-12-07 14:00:00","2022-12-07 15:00:00")

    UPDATE

    Even when several 'billed' timeranges intersect the same 'punch_clock' time range, you don't need an aggregate function. The following query should provide the expected result :

    SELECT contract_id
         , unnest ( range_agg(range) FILTER (WHERE type = 'punch_clock')
                  - range_agg(range) FILTER (WHERE type = 'billed')
                  ) AS unbilled
      FROM time_entries
     GROUP BY contract_id
    

    see the test result in dbfiddle

    for more information about creating an aggregate function see the manual