postgresql time-series window-functions timescaledb

Explode (duplicate) records based on timestamp range (PostgreSQL)

I'm trying to convert a timeseries recordset into something a bit more suitable for data analysis. Consider this following contiguous recordset (From, To, Value)

"2019-10-03 03:58:21+00"    "2019-10-03 03:59:04+00"    10
"2019-10-03 03:59:04+00"    "2019-10-03 03:59:54+00"    15
"2019-10-03 03:59:54+00"    "2019-10-03 04:02:00+00"    20
"2019-10-03 04:02:00+00"    "2019-10-03 04:02:10+00"    25

I want one record at the start of each minute, at the cost of "duplicated" rows.

"2019-10-03 03:58:21+00"    "2019-10-03 03:59:00+00"    10
"2019-10-03 03:59:00+00"    "2019-10-03 03:59:04+00"    10
"2019-10-03 03:59:04+00"    "2019-10-03 03:59:54+00"    15
"2019-10-03 03:59:54+00"    "2019-10-03 04:00:00+00"    20
"2019-10-03 04:00:00+00"    "2019-10-03 04:01:00+00"    20
"2019-10-03 04:01:00+00"    "2019-10-03 04:02:00+00"    20
"2019-10-03 04:02:00+00"    "2019-10-03 04:02:10+00"    25

First row is duplicated once because it overlaps a minute. Second is not duplicated because it's within a minute. Third is duplicated twice because it overlaps 3 minutes. Last is not exploded because it's within a minute (but it also starts at a minute). The value itself remains the same across exploded rows.

I have the feeling that my solution is a combination of Window functions (lead/lag?), generate_series() and various date_part / date_trunc calls. Possibly a cross or lateral self-join...

Currently using PostgreSQL 13.4 so I should have access to the latest and greatest API. I'm also using timescaledb 2.4.2 and hypertables if any of this helps, though their time_bucket function seems more about reducing row count, not increasing it.

Would love a push in the right direction!

Solution

As I expected, it was a combination of generate_series and window functions. But I didn't expect to have to create my own locf function, I thought LEAD/LAG had options to remember the last known / non-null values.

The following code takes a few known records, and unions them against a generated series of timestamps.

I needed to use DISTINCT ON to purge the generated records that already had known equivalents.

Then I can finally use LEAD for the "next date" and locf_any for the value carried forward.

--https://www.joyofdata.de/blog/locf-linear-imputation-postgresql-tutorial/

DROP FUNCTION locf_s(ANYELEMENT, ANYELEMENT) CASCADE;
CREATE OR REPLACE FUNCTION locf_s(a ANYELEMENT, b ANYELEMENT)
RETURNS ANYELEMENT
LANGUAGE sql
AS '
  SELECT COALESCE(b, a)
';

DROP AGGREGATE IF EXISTS locf_any(ANYELEMENT);
CREATE AGGREGATE locf_any(ANYELEMENT) (
  SFUNC = locf_s,
  STYPE = ANYELEMENT
);

SELECT from_time, LEAD(from_time) OVER W, locf_any(reading) OVER W
FROM
(
  SELECT DISTINCT ON (from_time) from_time, reading FROM
  (
    WITH readings (from_time, reading) AS (VALUES
     ('2019-10-03 03:58:21+00', 10),
     ('2019-10-03 03:59:04+00', 15),
     ('2019-10-03 03:59:54+00', 20),
     ('2019-10-03 04:02:00+00', 25)
    )
    (
      SELECT from_time::TIMESTAMPTZ, reading::INTEGER FROM readings
      UNION ALL
      SELECT generate_series('2019-10-03 03:59:00+00', '2019-10-03 04:04:00+00', '1 minute'::INTERVAL)::TIMESTAMPTZ, NULL
    )
  ) X
  ORDER BY from_time, reading NULLS LAST
) Y
WINDOW W AS (ORDER BY from_time ASC);