hacking to achieve (continuous aggregate) CAGG on top of CAGG on timescaleDB postgresql

Just wanted to follow up on this topic https://github.com/timescale/timescaledb/issues/1400 , is it possible to create a continuous aggregate on top of another continuous aggregate? (by performing some hacking?)

I’m planning to insert tick data into the database (maybe every 0.1s), and perform continuous aggregates on 1s, 1min, 1hour, 1day, 1 month , 1 year, 10 years. From my current understanding of CAGG, time wise, performing aggregate on 10 years would take ~3600* 24* 365* 10 times than CAGG on 1s, but if we do the CAGG on top of CAGG (hacking by creating new hypertable to track changes of previous materialized view), we could probably speed up exponentially…

Is my idea valid? Since I haven't coded up any trigger function on postgresql before...

Solution

Here is a simple prototype that you can hack yourself for feeding a second hypertable like continuous aggregates over top of another:

-- DROP TABLE ticks CASCADE;
-- DROP TABLE ohlc_1s CASCADE;
CREATE TABLE ticks ( time TIMESTAMP NOT NULL, symbol varchar, price decimal, volume int);
CREATE TABLE ohlc_1s ( time TIMESTAMP NOT NULL, symbol varchar, o decimal, h decimal, l decimal, c decimal, v int);
SELECT create_hypertable('ticks', 'time');
SELECT create_hypertable('ohlc_1s', 'time');

You can also use a trigger to feed the ohlc_1s table:

CREATE OR REPLACE FUNCTION feed_ohlc_1s() RETURNS trigger AS
$BODY$
DECLARE
    last_time timestamp;
BEGIN
   SELECT time_bucket('1 second', time) INTO last_time
   FROM ticks WHERE symbol = NEW.symbol
   ORDER BY time DESC LIMIT 1;

   -- When turn next second
   IF NEW.time - last_time >= INTERVAL '1 second' THEN
      INSERT INTO ohlc_1s (time, symbol, o, h, l, c, v)
        SELECT time_bucket('1 second', time) as time,
          symbol,
          FIRST(price, time) as open,
          MAX(price) as high,
          MIN(price) as low,
          LAST(price, time) as close,
          SUM(volume) as volume FROM ticks
        GROUP BY 1, 2 ORDER BY 1 DESC LIMIT 1;
  END IF;
  RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;

CREATE TRIGGER feed_ohlc_every_new_second
               BEFORE INSERT
               ON ticks
               FOR EACH ROW
               EXECUTE PROCEDURE feed_ohlc_1s();

And here we go with some examples of inserts:

INSERT INTO ticks VALUES 
('2021-08-26 10:09:00.01'::timestamp, 'SYMBOL', 10.1, 100),
('2021-08-26 10:09:00.08'::timestamp, 'SYMBOL', 10.0, 100),
('2021-08-26 10:09:00.23'::timestamp, 'SYMBOL', 10.2, 100),
('2021-08-26 10:09:00.40'::timestamp, 'SYMBOL', 10.3, 100);
table ticks;
table ohlc_1s;

As you can see while we're in the same second, it does not generate any new data:

┌────────────────────────┬────────┬───────┬────────┐
│          time          │ symbol │ price │ volume │
├────────────────────────┼────────┼───────┼────────┤
│ 2021-08-26 10:09:00.01 │ SYMBOL │  10.1 │    100 │
│ 2021-08-26 10:09:00.08 │ SYMBOL │  10.0 │    100 │
│ 2021-08-26 10:09:00.23 │ SYMBOL │  10.2 │    100 │
│ 2021-08-26 10:09:00.4  │ SYMBOL │  10.3 │    100 │
└────────────────────────┴────────┴───────┴────────┘
(4 rows)

┌──────┬────────┬───┬───┬───┬───┬───┐
│ time │ symbol │ o │ h │ l │ c │ v │
├──────┼────────┼───┼───┼───┼───┼───┤
└──────┴────────┴───┴───┴───┴───┴───┘
(0 rows)

Now inserting more 2 seconds of ticks:

INSERT INTO ticks VALUES 
('2021-08-26 10:09:01.02'::timestamp, 'SYMBOL', 10.0, 100),
('2021-08-26 10:09:01.04'::timestamp, 'SYMBOL', 14.0, 200),
('2021-08-26 10:09:01.42'::timestamp, 'SYMBOL', 12.3, 200),
('2021-08-26 10:09:01.62'::timestamp, 'SYMBOL', 8.3, 200),
('2021-08-26 10:09:02.80'::timestamp, 'SYMBOL', 9.0, 500);
table ticks;
table ohlc_1s;

Here is the output:

┌────────────────────────┬────────┬───────┬────────┐
│          time          │ symbol │ price │ volume │
├────────────────────────┼────────┼───────┼────────┤
│ 2021-08-26 10:09:00.01 │ SYMBOL │  10.1 │    100 │
│ 2021-08-26 10:09:00.08 │ SYMBOL │  10.0 │    100 │
│ 2021-08-26 10:09:00.23 │ SYMBOL │  10.2 │    100 │
│ 2021-08-26 10:09:00.4  │ SYMBOL │  10.3 │    100 │
│ 2021-08-26 10:09:01.02 │ SYMBOL │  10.0 │    100 │
│ 2021-08-26 10:09:01.04 │ SYMBOL │  14.0 │    200 │
│ 2021-08-26 10:09:01.42 │ SYMBOL │  12.3 │    200 │
│ 2021-08-26 10:09:01.62 │ SYMBOL │   8.3 │    200 │
│ 2021-08-26 10:09:02.8  │ SYMBOL │   9.0 │    500 │
└────────────────────────┴────────┴───────┴────────┘
(9 rows)

┌─────────────────────┬────────┬──────┬──────┬──────┬──────┬─────┐
│        time         │ symbol │  o   │  h   │  l   │  c   │  v  │
├─────────────────────┼────────┼──────┼──────┼──────┼──────┼─────┤
│ 2021-08-26 10:09:00 │ SYMBOL │ 10.1 │ 10.3 │ 10.0 │ 10.3 │ 400 │
│ 2021-08-26 10:09:01 │ SYMBOL │ 10.0 │ 14.0 │  8.3 │  8.3 │ 700 │
└─────────────────────┴────────┴──────┴──────┴──────┴──────┴─────┘
(2 rows)

Probably chain it on bigger timeframes will also make it very straightforward.