Search code examples
sqlpostgresqlwindow-functionsgaps-and-islands

Sum a Column of a Timeseries by an Order Number when the Ordernumber is not unique


I have a table like this: demo at db<>fiddle

CREATE TABLE test(id, order_id, start, end1, count) AS VALUES
(1, 1, '2023-12-19 10:00:00'::timestamp, '2023-12-19 11:00:00'::timestamp, 15),
(2, 1, '2023-12-19 11:00:00', '2023-12-19 12:00:00', 1),
(3, 2, '2023-12-19 12:00:00', '2023-12-19 13:00:00', 2),
(4, 3, '2023-12-19 13:00:00', '2023-12-19 14:00:00', 10),
(5, 1, '2023-12-19 14:00:00', '2023-12-19 15:00:00', 4),
(6, 1, '2023-12-19 15:00:00', '2023-12-19 16:00:00', 7),
(7, 1, '2023-12-19 16:00:00', '2023-12-19 17:00:00', 3),
(8, 3, '2023-12-19 17:00:00', '2023-12-19 18:00:00', 21),
(9, 1, '2023-12-19 18:00:00', '2023-12-19 19:00:00', 5);

The result I like to get, merging adjacent orders together, summing up their count and extending their time range:

order_id start end count
1 merged from rows 1 and 2 10:00 12:00 16
2 12:00 13:00 2
3 13:00 14:00 10
1 merged from rows 5, 6 and 7 14:00 17:00 14
3 17:00 18:00 21
1 18:00 19:00 5

Last thing I tried:

WITH lp AS (
    SELECT *,LEAD(order_id) OVER(ORDER BY start) AS next_id
    FROM test
)
SELECT order_id,
    MIN(start) AS start,
    MAX(end1) AS end,
    SUM(count) AS count
   FROM lp
    WHERE order_id = next_id
  GROUP BY order_id
  ORDER BY MIN(start);

I know that the query is incorrect. How to include the first line of the table?
Any ideas how to fix?


Solution

    1. You can get current order_id and the previous one using lag() window function which is part of ordered_data CTE.
    2. Create a new group every time a new order is found, otherwise contiguous block of the same order_id gets a unique grp value in grouped_data CTE.
    3. In final select based on order_id and grp, get the sum() of count:

    Fiddle

    WITH ordered_data AS(
        SELECT*, order_id<>LAG(order_id,1,order_id)OVER(ORDER BY start)AS is_diff
        FROM test
    ),grouped_data AS(
        SELECT*, count(*)filter(where is_diff)OVER(ORDER BY start)AS grp
        FROM ordered_data)
    SELECT order_id
         , MIN(start) AS start
         , MAX(end1) AS end1
         , SUM(count) AS count
    FROM grouped_data
    GROUP BY order_id, grp
    ORDER BY MIN(start);
    
    order_id start end1 count
    1 2023-12-19 10:00:00 2023-12-19 12:00:00 16
    2 2023-12-19 12:00:00 2023-12-19 13:00:00 2
    3 2023-12-19 13:00:00 2023-12-19 14:00:00 10
    1 2023-12-19 14:00:00 2023-12-19 17:00:00 14
    3 2023-12-19 17:00:00 2023-12-19 18:00:00 21
    1 2023-12-19 18:00:00 2023-12-19 19:00:00 5