Search code examples
aggregate-functionsquestdbohlc

How do I aggregate open and close values in QuestDB using SAMPLE BY


I'm working on a price data project, trying to store OHLC 1 minute data and then sampling it over various intervals. I decided to try QuestDB for its neat time-series features such as SAMPLE BY but am running into a wall.

I want to use SAMPLE BY to query my data, which looks like this:

timestamp open high low close
"2021-09-10T19:43:21.657672Z" 2.0 4.0 1.0 3.0
"2021-09-15T06:12:42.267416Z" 3.0 6.0 3.0 6.0

For example, for the two rows above, which are in the same month, I would like to run SAMPLE BY 1M and aggregate the values for the month.

For high and low values the aggregate functions are min(low) and max(high), but I can't figure out what to do for open and close values.

I tried retrofitting this approach but got as far as the following without errors:

with ohlc as ( 
    select row_number() over (
        partition by cast(cast(timestamp as float) / 1000000 / 60 / 60 / 24 / 7 / 4.34524 as int)
        order by timestamp
    ), open, high, low, close, timestamp
    from eurusd
) 
select max(high) high,
    min(low) low,
    min(case when rn_asc = 1 then [open] end) as open
from ohlc

Some errors I encountered:

  1. Adding the second row_number() over [...] order by timestamp desc fails with a syntax error. If I remove the desc it works, so I just removed that column altogether since then it's the same as the first.
  2. Neither SAMPLE BY nor GROUP BY at the end works with "base query does not provide dedicated TIMESTAMP column" and "group by expression does not match anything select in statement" errors, respectively.
  3. QuestDB recognizes the CASE WHEN THEN but if I yield a column name in the THEN portion it doesn't work with no error message.

I'm currently stuck at this point having looked across the QuestDB tutorials, SO questions, and GitHub issues for a mention of this because it seems like a common operation, especially since they have written many price time-series data tutorials, but none address this.

I'm wondering if anyone has encountered this and has a solution? Thank you!


Solution

  • It will be first() and last() to get open and close.

    select max(high) high,
        min(low) low,
        first(open) as open,
        last(close) as close
    from ohlc
    SAMPLE BY 1M