Search code examples
databasetime-seriesquestdb

Query timestamp gaps, grouped by a Column value


I would like to check a table for unexpected data time-gaps, grouped by a column. For example is there a query for an input table:

Datetime,            Sym
(timestamp)         (symbol)

2024-10-01 08:00:00  AA
2024-10-01 08:00:00  CC
2024-10-01 08:00:00  BB
...
2024-10-01 08:01:00  AA
2024-10-01 08:01:00  BB
2024-10-01 08:01:00  CC
...
2024-10-01 08:02:00  AA
2024-10-01 08:03:00  BB
2024-10-01 08:04:00  CC

which yields result:

Datetime,            Sym, Time_gap
2024-10-01 08:04:00  CC    3
2024-10-01 08:03:00  BB    2
2024-10-01 08:02:00  AA    1
...
2024-10-01 08:01:00  AA    1
2024-10-01 08:01:00  BB    1
2024-10-01 08:01:00  CC    1

For each ‘Sym’ the time-gap between consecutive rows is calculated as ‘Time_gap’, with the timestamp of the event, and sorted DESC by ‘Time_gap’.

Note I showed ‘Time_gap’ as an integer (minutes), but any numeric value (seconds, timestamp) would suffice.

I know in other databases you can use the LAG function, but I see in QuestDB there is no such a function and one can use instead the first_value window function. My problem is when I try to pass a timestamp to that function I get an error because the timestamp type is not supported.


Solution

  • We can indeed use the first_value() window function to get the timestamp from the row before partitioned by Sym, but at the moment we need a workaround, as the function supports just a few data types. I opened a github issue about this, so it might be implemented in the future.

    The workaround consists just in casting the timestamp as long (it is an epoch) and then back to a timestamp on a second query. I am showing how to calculate the gap in both seconds and milliseconds below just for completeness, but of course you need only one of them. I am using in one case the datediff function and in the other just substracting both epochs and dividing to get ms.

    WITH time_and_prev AS (
      SELECT Datetime, Sym, 
      first_value(Datetime::long) 
          OVER (PARTITION BY Sym 
                ORDER BY Datetime 
                ROWS BETWEEN 1 PRECEDING 
                     AND 1 PRECEDING 
            ) AS prevTimestamp 
       FROM table where Datetime IN today()
    )
    SELECT Datetime, Sym, prevTimestamp::timestamp, 
           datediff('s', Datetime, prevTimestamp::timestamp) as gap_s, 
           (Datetime - prevTimestamp::timestamp) / 1000 as gap_ms 
    FROM time_and_prev 
    ORDER BY 4 DESC;