Search code examples
sqldatabasetime-seriesquestdb

Remove outliers compared to recent data in my table


We have many UIs printing "Candle Stick Data" of trading pairs using our API and QuestDB. In some markets, there are very low volume trades which move the candle a lot (sometimes due to low liquidity and other factors).

Those are usually single trades with very low volume where the exchange rate differs a lot from the actual exchange rate. This makes Charts hard to use and would like to remove those from the chart. This is my current query, the problematic column is the amount:

SELECT 
    timestamp, symbol, 
    first(price) AS open,
    last(price) AS close,
    min(price),
    max(price),
    sum(amount) AS volume
FROM trades
WHERE timestamp > dateadd('M', -1, now())
SAMPLE BY 1d ALIGN TO CALENDAR;

So basically the question is: is there a way to only select trades where the single traded volume is at least 1% (or more) of the total volume in the sampled interval?


Solution

  • You can use a window function to get, for example, the average, or the running sum, for the amount, and you can then SAMPLE BY in an outer query and compare the value of the sampled interval against the moving data. You can do this for the whole interval (when you don't specify ORDER BY and RANGE in the window definition), or you can make it relative to an interval in the past.

    For example this query would compare with the average of the past 6 days (7 days ago, but excluding the current row).

    with moving_trades AS (    
    SELECT timestamp, symbol, price, amount, 
           avg(amount) over (PARTITION BY symbol ORDER BY timestamp RANGE between 7 days PRECEDING AND 1 day preceding ) moving_avg_7_days
    FROM trades 
    WHERE timestamp > dateadd('d', -37, now())
    )
    SELECT 
        timestamp, symbol, 
        first(price) AS open,
        last(price) AS close,
        min(price),
        max(price),
        sum(amount) AS volume
    FROM moving_trades
    WHERE timestamp > dateadd('M', -1, now())
    moving_avg_7_days is not null and ABS(moving_avg_7_days - price) > moving_avg_7_days * 0.01
    SAMPLE BY 1d ALIGN TO CALENDAR;