Search code examples
databasetime-seriesquestdb

Best way to automate downsampling data + deleting old data in QuestDB


I have a table receiving several hundred million rows of market data per day. Data has microsecond resolution and I use it to display both real-time charts but also historical data.

This is my table schema

CREATE TABLE 'trades' ( 
    symbol SYMBOL CAPACITY 256 CACHE,
    side SYMBOL CAPACITY 256 CACHE,
    price DOUBLE,
    amount DOUBLE,
    timestamp TIMESTAMP
) timestamp(timestamp) PARTITION BY DAY WAL
WITH maxUncommittedRows=500000, o3MaxLag=2000000us;

I want to save space and optimize queries by downsampling the data into a second table, and deleting data older than 3 days from the live table. I would like to downsample like this:

SELECT timestamp, symbol, side, first(price) AS open, last(price) AS close,
    min(price), max(price), sum(amount) AS volume
FROM trades
WHERE 
SAMPLE BY 1m

I see you can use tools like airflow to detach partitions and move data elsewhere. Would an airflow automation be the best course of action for downsampling my table and deleting data as well?


Solution

  • You can use tools like airflow or dagster to automate this workflow, but since the workflow is quite simple, you can use the recently released TTL and materialized views.

    First we create a materialized view, as in:

    SELECT timestamp, symbol, side, first(price) AS open, last(price) AS close,
        min(price), max(price), sum(amount) AS volume
    FROM trades
    WHERE 
    SAMPLE BY 1m
    ) PARTITION BY DAY;
    

    Now every time we have new data, the view will be materialized and data will be stored sampled by minute.

    We can now add TTL to the original table, as in:

    ALTER TABLE trades SET TTL 4 DAYS;
    

    If we want to downsample older data even more, we can also add a TTL to the materialized view to remove old data, for example, after 6 months, and we could create a second materialized view storing older data at, for example one hour or one day intervals.