Search code examples
sqltimescaledb

How to use SQL-Timeseries time bucket to aggregate a table and also return a FK of the greatest row


Given the table (hypertable) with the columns peripheral (FK), data point type (FK), time and value, can the peripheral and/or data point type be included with each maximum aggregate? Can the top n (like top 3) FKs be included?

         Table "public.iot_datapoint"
       Column       |           Type
--------------------+--------------------------
 time               | timestamp with time zone
 value              | double precision
 data_point_type_id | uuid
 peripheral_id      | uuid
-- Current query
SELECT time_bucket('60.000s', "time") AS "time_bucket",
       MAX(iot_datapoint.value) AS "0"
FROM iot_datapoint
WHERE (peripheral_id IN (...)
       AND data_point_type_id IN (...)
       AND TIME >= %(A0)s
       AND TIME < %(A1)s)
GROUP BY 1
ORDER BY 1
-- Tried query
SELECT time_bucket('60.000s', "time") AS "time_bucket",
       MAX(iot_datapoint.value) AS "0", peripheral_id
FROM iot_datapoint
WHERE (peripheral_id IN (...)
       AND data_point_type_id IN (...)
       AND TIME >= %(A0)s
       AND TIME < %(A1)s)
GROUP BY 1, peripheral_id
ORDER BY 1

Adding peripheral_id in the SELECT requires it to be added to the GROUP BY but that prevents the maximum being returned for the whole time bucket.

-- Current result
      time_bucket       |       0       
------------------------+---------------
 2024-06-04 16:21:00+00 |            -2
 2024-06-04 16:22:00+00 |          -0.4
-- Tried result
      time_bucket       |       0       |            peripheral_id             
------------------------+---------------+--------------------------------------
 2024-06-04 16:21:00+00 |            -1 | b319b263-0526-4a26-b71a-39553869b2a0
 2024-06-04 16:21:00+00 |            -2 | 894b4df3-ec52-4358-b8fa-d2bbece6e8ff
 2024-06-04 16:22:00+00 |          -0.4 | b319b263-0526-4a26-b71a-39553869b2a0
 2024-06-04 16:22:00+00 |         1.234 | 894b4df3-ec52-4358-b8fa-d2bbece6e8ff
-- Ideal result (top 1 FK)
      time_bucket       |       0       |            max_1_peripheral_id             
------------------------+---------------+--------------------------------------
 2024-06-04 16:21:00+00 |            -2 | 894b4df3-ec52-4358-b8fa-d2bbece6e8ff
 2024-06-04 16:22:00+00 |          -0.4 | b319b263-0526-4a26-b71a-39553869b2a0
-- Ideal result (top 2 FK)
      time_bucket       |       0       |            max_1_peripheral_id       |   max_2_peripheral_id
------------------------+---------------+--------------------------------------+--------------------------------------
 2024-06-04 16:21:00+00 |            -2 | 894b4df3-ec52-4358-b8fa-d2bbece6e8ff | b319b263-0526-4a26-b71a-39553869b2a0
 2024-06-04 16:22:00+00 |          -0.4 | b319b263-0526-4a26-b71a-39553869b2a0 | 894b4df3-ec52-4358-b8fa-d2bbece6e8ff

Solution

  • try with subquery to rank peripherals by value, then join ranked peripherals to time buckets and use LEFT JOIN and COALESCE:

    WITH ranked_peripherals AS (
      SELECT 
        time_bucket('60.000s', "time") AS timeb,
        iot_datapoint.peripheral_id AS peripheral_id,
        MAX(iot_datapoint.value) AS max_value,
        ROW_NUMBER() OVER (PARTITION BY time_bucket('60.000s', "time") ORDER BY MAX(iot_datapoint.value) DESC) AS rank
      FROM iot_datapoint
      WHERE TIME >= '2024-06-04 16:21:00+00' AND TIME < '2024-06-04 16:22:59+00'
      GROUP BY time_bucket('60.000s', "time"), peripheral_id
    )
    SELECT
      DISTINCT ON (timebx)
      timebx,
      max_valuex as "0",
      COALESCE(ranked_peripherals_1.peripheral_id, '00000000-0000-0000-0000-000000000000') AS max_1_peripheral_id,
      COALESCE(ranked_peripherals_2.peripheral_id, '00000000-0000-0000-0000-000000000000') AS max_2_peripheral_id,
      COALESCE(ranked_peripherals_3.peripheral_id, '00000000-0000-0000-0000-000000000000') AS max_3_peripheral_id
    FROM (
      -- rank has to be selected or the lowest rank is returned. Safe?
      SELECT DISTINCT timeb AS timebx, rank AS rankx, max_value AS max_valuex FROM ranked_peripherals
    ) AS time_buckets
    LEFT JOIN ranked_peripherals AS ranked_peripherals_1 ON time_buckets.timebx = ranked_peripherals_1.timeb AND ranked_peripherals_1.rank = 1
    LEFT JOIN ranked_peripherals AS ranked_peripherals_2 ON time_buckets.timebx = ranked_peripherals_2.timeb AND ranked_peripherals_2.rank = 2
    LEFT JOIN ranked_peripherals AS ranked_peripherals_3 ON time_buckets.timebx = ranked_peripherals_3.timeb AND ranked_peripherals_3.rank = 3
    ORDER BY timebx;
    
             timebx         |  0  |         max_1_peripheral_id          |         max_2_peripheral_id          |         max_3_peripheral_id          
    ------------------------+-----+--------------------------------------+--------------------------------------+--------------------------------------
     2024-06-04 16:21:00+00 | 2.1 | dabb33ae-614f-491c-abd5-188a61102788 | 80eb066d-1778-4782-ab66-84f4808e7d13 | 894b4df3-ec52-4358-b8fa-d2bbece6e8ff
     2024-06-04 16:22:00+00 | 2.4 | 894b4df3-ec52-4358-b8fa-d2bbece6e8ff | b319b263-0526-4a26-b71a-39553869b2a0 | 00000000-0000-0000-0000-000000000000