sql postgresql window-functions gaps-and-islands

Sum a Column of a Timeseries by an Order Number when the Ordernumber is not unique

I have a table like this: _{demo at db<>fiddle}

CREATE TABLE test(id, order_id, start, end1, count) AS VALUES
(1, 1, '2023-12-19 10:00:00'::timestamp, '2023-12-19 11:00:00'::timestamp, 15),
(2, 1, '2023-12-19 11:00:00', '2023-12-19 12:00:00', 1),
(3, 2, '2023-12-19 12:00:00', '2023-12-19 13:00:00', 2),
(4, 3, '2023-12-19 13:00:00', '2023-12-19 14:00:00', 10),
(5, 1, '2023-12-19 14:00:00', '2023-12-19 15:00:00', 4),
(6, 1, '2023-12-19 15:00:00', '2023-12-19 16:00:00', 7),
(7, 1, '2023-12-19 16:00:00', '2023-12-19 17:00:00', 3),
(8, 3, '2023-12-19 17:00:00', '2023-12-19 18:00:00', 21),
(9, 1, '2023-12-19 18:00:00', '2023-12-19 19:00:00', 5);

The result I like to get, merging adjacent orders together, summing up their count and extending their time range:

order_id	start	end	count
1_{merged from rows 1 and 2}	10:00	12:00	16
2	12:00	13:00	2
3	13:00	14:00	10
1_{merged from rows 5, 6 and 7}	14:00	17:00	14
3	17:00	18:00	21
1	18:00	19:00	5

Last thing I tried:

WITH lp AS (
    SELECT *,LEAD(order_id) OVER(ORDER BY start) AS next_id
    FROM test
)
SELECT order_id,
    MIN(start) AS start,
    MAX(end1) AS end,
    SUM(count) AS count
   FROM lp
    WHERE order_id = next_id
  GROUP BY order_id
  ORDER BY MIN(start);

I know that the query is incorrect. How to include the first line of the table?
Any ideas how to fix?

Solution

You can get current order_id and the previous one using lag() window function which is part of ordered_data CTE.
Create a new group every time a new order is found, otherwise contiguous block of the same order_id gets a unique grp value in grouped_data CTE.
In final select based on order_id and grp, get the sum() of count:

Fiddle

WITH ordered_data AS(
    SELECT*, order_id<>LAG(order_id,1,order_id)OVER(ORDER BY start)AS is_diff
    FROM test
),grouped_data AS(
    SELECT*, count(*)filter(where is_diff)OVER(ORDER BY start)AS grp
    FROM ordered_data)
SELECT order_id
     , MIN(start) AS start
     , MAX(end1) AS end1
     , SUM(count) AS count
FROM grouped_data
GROUP BY order_id, grp
ORDER BY MIN(start);

order_id	start	end1	count
1	2023-12-19 10:00:00	2023-12-19 12:00:00	16
2	2023-12-19 12:00:00	2023-12-19 13:00:00	2
3	2023-12-19 13:00:00	2023-12-19 14:00:00	10
1	2023-12-19 14:00:00	2023-12-19 17:00:00	14
3	2023-12-19 17:00:00	2023-12-19 18:00:00	21
1	2023-12-19 18:00:00	2023-12-19 19:00:00	5