Search code examples
sqldatabasetime-seriesquestdb

Plotting overlaid data from past week


I have this very simple query below. I can add it to my Grafana and plot a chart representing the activity in 1 hour intervals, and if I add a time filter, I can scope it to today. All good.

SELECT timestamp as time, server, count(rate)
FROM
  mytable
WHERE
    server = 'mysrv'  SAMPLE BY 1h

I can then change the time filter to display the last 7 days, but in this case I see the activity for each day sequentially, which makes sense of course. What I would prefer is to overlay each of the past 7 days on top of the current day, so I can compare the activity for each day at a given hour.

I thought this would be easy, but after trying my luck using window functions, unions and everything I could think of, didn't manage to get a working solution. I have the feeling I can do some date arithmetics to normalise this, but cannot really find the way.


Solution

  • A bit of date arithmetic and we are done here.

    In the first subquery (sampled) I am just filtering the past 7 days and doing the sample per hour.

    On the next one (sampled_offsets), we can calculate the difference in days between each row and today. I am using this sql for that datediff('d', timestamp_floor('d',timestamp), timestamp_floor('d',now())) as days_ago. Note that I don't want to know the real difference in days. If I run this today at 9AM and I have a row from yesterday at 11PM, I want the difference to be '1' to know that the row belongs to 1 day ago, even if technically this is within 24 hours. So I am using timestamp_floor on both timestamps to get the right offset.

    Now in the next subquery (normalized) I can add to each row timestamp, the offset in days, so I am overlaying the data as if it was happening today, but at the right timestamp. After this query, I would have rows only showing dates for today, but with a days_ago column telling me in reality it belongs to a different day.

    And now in the final SELECT I can just pivot the results, group (implicit in QuestDB) by time and server, and just assign to the right column depending on the offset in days.

    The final result has 24 rows, one for each hour in the day, and 7 columns, one each representing the avg for that particular timestamp in each of the past 7 days.

    WITH sampled AS (
    SELECT timestamp, server, avg(rate) as rate
    FROM  mytable
    WHERE server = 'Calumet Beach'
    AND ( timestamp >= dateadd('d', -7, timestamp_floor('d', now())) AND timestamp < timestamp_ceil('d', now()) )
    SAMPLE by 1h FILL(NULL)
    ), sampled_offsets AS (
    SELECT timestamp, server, rate, 
      datediff('d',  timestamp_floor('d',timestamp),  timestamp_floor('d',now())) as days_ago
    from sampled
      ),
      normalized AS (
    SELECT server, rate, dateadd('d', days_ago::int, timestamp) as time,timestamp, days_ago
    from sampled_offsets
    )
    select time, server, 
    SUM(CASE WHEN days_ago = 0 THEN rate END) AS rate_current,
    SUM(CASE WHEN days_ago = 1 THEN rate END) AS rate_1_ago,
    SUM(CASE WHEN days_ago = 2 THEN rate END) AS rate_2_ago,
    SUM(CASE WHEN days_ago = 3 THEN rate END) AS rate_3_ago,
    SUM(CASE WHEN days_ago = 4 THEN rate END) AS rate_4_ago,
    SUM(CASE WHEN days_ago = 5 THEN rate END) AS rate_5_ago,
    SUM(CASE WHEN days_ago = 6 THEN rate END) AS rate_6_ago,
     from normalized
    ;
    

    Now, if I execute this query I can plot each column as a different series and they overlap in time.

    Chart representing a timeseries per day at hourly intervals