Search code examples
sqlpostgresqlquery-optimizationgrafana

Limit result rows for minimal time intervals for PostgreSQL


Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.

I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.

I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.

The car has 4 status, driving / online / asleep / offline.

  1. If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
  2. If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
  3. Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
  4. If the car is in asleep/offline status, no data is recorded at all.

This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.

I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.

E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.

row     date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000

The current (problematic) method I am using is as follows:

SELECT $__time(t.date), t.battery_level AS "SOC [%]" 
FROM (
  SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
  FROM (
    SELECT battery_level, date
    FROM positions
    WHERE car_id = $car_id AND $__timeFilter(date)
    UNION ALL
    SELECT battery_level, date
    FROM charges c 
    JOIN charging_processes p ON p.id = c.charging_process_id
    WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
  ORDER BY date ASC) as t
  WHERE t.row % 100 = 0;

This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)

PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.

Thanks in advance!


Solution

  • You can use a recursive CTE:

    WITH RECURSIVE rec(cur_row, cur_date) AS (
        (
            SELECT row, date
            FROM t
            ORDER BY date
            LIMIT 1
        )
    
        UNION ALL
    
        (
            SELECT row, date
            FROM t
            JOIN rec
              ON t.date >= cur_date + 1000
            ORDER BY t.date
            LIMIT 1
        )
    )
    SELECT *
    FROM rec;
    
    cur_row cur_date
    1 1610000001000
    4 1610000002000
    5 1610000005000

    View on DB Fiddle


    Using a function instead would probably be faster:

    CREATE OR REPLACE FUNCTION f() RETURNS SETOF t AS
    $$
    DECLARE
        row      t%ROWTYPE;
        cur_date BIGINT;
    BEGIN
        FOR row IN
            SELECT *
            FROM t
            ORDER BY date
        LOOP
            IF row.date >= cur_date + 1000 OR cur_date IS NULL
            THEN
                cur_date := row.date;
                RETURN NEXT row;
            END IF;
        END LOOP;
    END;
    $$ LANGUAGE plpgsql;
    

    SELECT *
    FROM f();
    
    row date
    1 1610000001000
    4 1610000002000
    5 1610000005000