Building a timeseries with cumulative sums based on singe grouped events
To begin I need to highlight I am quite new to SQL and the current state was achieved with the help of stack overflow.
I have a table containing events of different kinds as they occur. I would like to transform this data into a time series filling the time gaps between the events and cumulating the numbers across the different kind of events.
So it is all about creating a timeseries from single events and calculate the running/cumulative sum across groups of events
Here the example of source data:
Source:
event_timestamp type value
01.01.2023 10:00 1 10
03.01.2023 10:00 2 10
05.01.2023 10:00 2 10
07.01.2023 10:00 1 10
Desired outcome:
event_timestamp type value cumulative_sum
01.01.2023 10:00 1 10 10
02.01.2023 10:00 1 0 10
03.01.2023 10:00 1 0 10
03.01.2023 10:00 2 10 10
04.01.2023 10:00 1 0 10
04.01.2023 10:00 2 0 10
05.01.2023 10:00 1 0 10
05.01.2023 10:00 2 10 20
06.01.2023 10:00 1 0 10
06.01.2023 10:00 2 0 20
07.01.2023 10:00 1 10 20
07.01.2023 10:00 2 0 20
I got to a point, where I can create this (limited to a single event type):
time type value cumulative_sum
01.01.2023 10:00 1 10 10
02.01.2023 10:00 1 0 10
03.01.2023 10:00 1 10 20
04.01.2023 10:00 1 0 20
05.01.2023 10:00 1 0 20
06.01.2023 10:00 1 0 20
07.01.2023 10:00 1 0 20
with the following sql statement (postgresql):
SELECT
generate_series AS timestamp,
-- hard coded event type below
COALESCE(events.type, 1) AS type,
COALESCE(events.value, 0) AS value,
COALESCE(SUM(td.value) OVER (ORDER BY generate_series), 0) AS cumulative_sum
FROM
generate_series('2023-01-01'::timestamp, '2023-01-07'::timestamp, '1 day') AS generate_series
LEFT JOIN
-- hard coded event type below
events ON generate_series = events.event_timestamp AND event.type = 1
ORDER BY
generate_series;
Now I am questioning if I should try to find the SQL statement doing all in a better way (no idea how) or if I should use a python script to just run this statement per event type and sequentally , insert the data into the table.
Questions summarized:
You're very close. All you need is to create the series for each type. A cross join can do that easily.
Check this out:
SELECT generate_series AS timestamp
, etypes.type
, COALESCE(events.value, 0) AS value
, COALESCE(SUM(events.value) OVER (PARTITION BY etypes.type ORDER BY generate_series), 0) AS cumulative_sum
FROM generate_series('2023-01-01'::timestamp, '2023-01-07'::timestamp, '1 day') AS generate_series
CROSS JOIN (SELECT DISTINCT type FROM events) AS etypes
LEFT JOIN events ON generate_series = events.event_timestamp AND events.type = etypes.type
ORDER BY generate_series, etypes.type
;
With no data in the given range, the result is this:
timestamp | type | value | cumulative_sum |
---|---|---|---|
2023-01-01 00:00:00 | 1 | 0 | 0 |
2023-01-01 00:00:00 | 3 | 0 | 0 |
2023-01-02 00:00:00 | 1 | 0 | 0 |
2023-01-02 00:00:00 | 3 | 0 | 0 |
2023-01-03 00:00:00 | 1 | 0 | 0 |
2023-01-03 00:00:00 | 3 | 0 | 0 |
2023-01-04 00:00:00 | 1 | 0 | 0 |
2023-01-04 00:00:00 | 3 | 0 | 0 |
2023-01-05 00:00:00 | 1 | 0 | 0 |
2023-01-05 00:00:00 | 3 | 0 | 0 |
2023-01-06 00:00:00 | 1 | 0 | 0 |
2023-01-06 00:00:00 | 3 | 0 | 0 |
2023-01-07 00:00:00 | 1 | 0 | 0 |
2023-01-07 00:00:00 | 3 | 0 | 0 |
Test case (with a little data added)
You could also create the series with different range for each type, based on the start and end for that type.
Here's an example of that:
SELECT generate_series AS timestamp
, etypes.type
, COALESCE(events.value, 0) AS value
, COALESCE(SUM(events.value) OVER (PARTITION BY etypes.type ORDER BY generate_series), 0) AS cumulative_sum
FROM (SELECT type
, MIN(event_timestamp) AS mints
, MAX(event_timestamp) AS maxts
FROM events
GROUP BY type
) AS etypes
JOIN LATERAL (SELECT generate_series(mints, maxts, '1 day') AS generate_series) AS gs ON 1 = 1
LEFT JOIN events ON generate_series = events.event_timestamp AND events.type = etypes.type
ORDER BY generate_series, etypes.type
;
Result with your data:
timestamp | type | value | cumulative_sum |
---|---|---|---|
2023-01-01 00:00:00 | 1 | 10 | 10 |
2023-01-02 00:00:00 | 1 | 0 | 10 |
2023-01-03 00:00:00 | 1 | 0 | 10 |
2023-01-03 00:00:00 | 2 | 10 | 10 |
2023-01-04 00:00:00 | 1 | 0 | 10 |
2023-01-04 00:00:00 | 2 | 0 | 10 |
2023-01-05 00:00:00 | 1 | 0 | 10 |
2023-01-05 00:00:00 | 2 | 10 | 20 |
2023-01-06 00:00:00 | 1 | 0 | 10 |
2023-01-07 00:00:00 | 1 | 10 | 20 |