Search code examples
postgresqltimescaledb

hacking to achieve (continuous aggregate) CAGG on top of CAGG on timescaleDB postgresql


Just wanted to follow up on this topic https://github.com/timescale/timescaledb/issues/1400 , is it possible to create a continuous aggregate on top of another continuous aggregate? (by performing some hacking?)

I’m planning to insert tick data into the database (maybe every 0.1s), and perform continuous aggregates on 1s, 1min, 1hour, 1day, 1 month , 1 year, 10 years. From my current understanding of CAGG, time wise, performing aggregate on 10 years would take ~3600* 24* 365* 10 times than CAGG on 1s, but if we do the CAGG on top of CAGG (hacking by creating new hypertable to track changes of previous materialized view), we could probably speed up exponentially…

Is my idea valid? Since I haven't coded up any trigger function on postgresql before...


Solution

  • Here is a simple prototype that you can hack yourself for feeding a second hypertable like continuous aggregates over top of another:

    -- DROP TABLE ticks CASCADE;
    -- DROP TABLE ohlc_1s CASCADE;
    CREATE TABLE ticks ( time TIMESTAMP NOT NULL, symbol varchar, price decimal, volume int);
    CREATE TABLE ohlc_1s ( time TIMESTAMP NOT NULL, symbol varchar, o decimal, h decimal, l decimal, c decimal, v int);
    SELECT create_hypertable('ticks', 'time');
    SELECT create_hypertable('ohlc_1s', 'time');
    

    You can also use a trigger to feed the ohlc_1s table:

    CREATE OR REPLACE FUNCTION feed_ohlc_1s() RETURNS trigger AS
    $BODY$
    DECLARE
        last_time timestamp;
    BEGIN
       SELECT time_bucket('1 second', time) INTO last_time
       FROM ticks WHERE symbol = NEW.symbol
       ORDER BY time DESC LIMIT 1;
    
       -- When turn next second
       IF NEW.time - last_time >= INTERVAL '1 second' THEN
          INSERT INTO ohlc_1s (time, symbol, o, h, l, c, v)
            SELECT time_bucket('1 second', time) as time,
              symbol,
              FIRST(price, time) as open,
              MAX(price) as high,
              MIN(price) as low,
              LAST(price, time) as close,
              SUM(volume) as volume FROM ticks
            GROUP BY 1, 2 ORDER BY 1 DESC LIMIT 1;
      END IF;
      RETURN NEW;
    END;
    $BODY$
    LANGUAGE plpgsql;
    
    CREATE TRIGGER feed_ohlc_every_new_second
                   BEFORE INSERT
                   ON ticks
                   FOR EACH ROW
                   EXECUTE PROCEDURE feed_ohlc_1s();
    

    And here we go with some examples of inserts:

    INSERT INTO ticks VALUES 
    ('2021-08-26 10:09:00.01'::timestamp, 'SYMBOL', 10.1, 100),
    ('2021-08-26 10:09:00.08'::timestamp, 'SYMBOL', 10.0, 100),
    ('2021-08-26 10:09:00.23'::timestamp, 'SYMBOL', 10.2, 100),
    ('2021-08-26 10:09:00.40'::timestamp, 'SYMBOL', 10.3, 100);
    table ticks;
    table ohlc_1s;
    

    As you can see while we're in the same second, it does not generate any new data:

    ┌────────────────────────┬────────┬───────┬────────┐
    │          time          │ symbol │ price │ volume │
    ├────────────────────────┼────────┼───────┼────────┤
    │ 2021-08-26 10:09:00.01 │ SYMBOL │  10.1 │    100 │
    │ 2021-08-26 10:09:00.08 │ SYMBOL │  10.0 │    100 │
    │ 2021-08-26 10:09:00.23 │ SYMBOL │  10.2 │    100 │
    │ 2021-08-26 10:09:00.4  │ SYMBOL │  10.3 │    100 │
    └────────────────────────┴────────┴───────┴────────┘
    (4 rows)
    
    ┌──────┬────────┬───┬───┬───┬───┬───┐
    │ time │ symbol │ o │ h │ l │ c │ v │
    ├──────┼────────┼───┼───┼───┼───┼───┤
    └──────┴────────┴───┴───┴───┴───┴───┘
    (0 rows)
    

    Now inserting more 2 seconds of ticks:

    INSERT INTO ticks VALUES 
    ('2021-08-26 10:09:01.02'::timestamp, 'SYMBOL', 10.0, 100),
    ('2021-08-26 10:09:01.04'::timestamp, 'SYMBOL', 14.0, 200),
    ('2021-08-26 10:09:01.42'::timestamp, 'SYMBOL', 12.3, 200),
    ('2021-08-26 10:09:01.62'::timestamp, 'SYMBOL', 8.3, 200),
    ('2021-08-26 10:09:02.80'::timestamp, 'SYMBOL', 9.0, 500);
    table ticks;
    table ohlc_1s;
    

    Here is the output:

    ┌────────────────────────┬────────┬───────┬────────┐
    │          time          │ symbol │ price │ volume │
    ├────────────────────────┼────────┼───────┼────────┤
    │ 2021-08-26 10:09:00.01 │ SYMBOL │  10.1 │    100 │
    │ 2021-08-26 10:09:00.08 │ SYMBOL │  10.0 │    100 │
    │ 2021-08-26 10:09:00.23 │ SYMBOL │  10.2 │    100 │
    │ 2021-08-26 10:09:00.4  │ SYMBOL │  10.3 │    100 │
    │ 2021-08-26 10:09:01.02 │ SYMBOL │  10.0 │    100 │
    │ 2021-08-26 10:09:01.04 │ SYMBOL │  14.0 │    200 │
    │ 2021-08-26 10:09:01.42 │ SYMBOL │  12.3 │    200 │
    │ 2021-08-26 10:09:01.62 │ SYMBOL │   8.3 │    200 │
    │ 2021-08-26 10:09:02.8  │ SYMBOL │   9.0 │    500 │
    └────────────────────────┴────────┴───────┴────────┘
    (9 rows)
    
    ┌─────────────────────┬────────┬──────┬──────┬──────┬──────┬─────┐
    │        time         │ symbol │  o   │  h   │  l   │  c   │  v  │
    ├─────────────────────┼────────┼──────┼──────┼──────┼──────┼─────┤
    │ 2021-08-26 10:09:00 │ SYMBOL │ 10.1 │ 10.3 │ 10.0 │ 10.3 │ 400 │
    │ 2021-08-26 10:09:01 │ SYMBOL │ 10.0 │ 14.0 │  8.3 │  8.3 │ 700 │
    └─────────────────────┴────────┴──────┴──────┴──────┴──────┴─────┘
    (2 rows)
    

    Probably chain it on bigger timeframes will also make it very straightforward.