sql postgresql query-optimization grafana

Limit result rows for minimal time intervals for PostgreSQL

Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.

I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.

I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.

The car has 4 status, driving / online / asleep / offline.

If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
If the car is in asleep/offline status, no data is recorded at all.

This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.

I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.

E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.

row     date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000

The current (problematic) method I am using is as follows:

SELECT $__time(t.date), t.battery_level AS "SOC [%]" 
FROM (
  SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
  FROM (
    SELECT battery_level, date
    FROM positions
    WHERE car_id = $car_id AND $__timeFilter(date)
    UNION ALL
    SELECT battery_level, date
    FROM charges c 
    JOIN charging_processes p ON p.id = c.charging_process_id
    WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
  ORDER BY date ASC) as t
  WHERE t.row % 100 = 0;

This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)

PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.

Thanks in advance!

Solution

You can use a recursive CTE:

WITH RECURSIVE rec(cur_row, cur_date) AS (
    (
        SELECT row, date
        FROM t
        ORDER BY date
        LIMIT 1
    )

    UNION ALL

    (
        SELECT row, date
        FROM t
        JOIN rec
          ON t.date >= cur_date + 1000
        ORDER BY t.date
        LIMIT 1
    )
)
SELECT *
FROM rec;

cur_row	cur_date
1	1610000001000
4	1610000002000
5	1610000005000

View on DB Fiddle

Using a function instead would probably be faster:

CREATE OR REPLACE FUNCTION f() RETURNS SETOF t AS
$$
DECLARE
    row      t%ROWTYPE;
    cur_date BIGINT;
BEGIN
    FOR row IN
        SELECT *
        FROM t
        ORDER BY date
    LOOP
        IF row.date >= cur_date + 1000 OR cur_date IS NULL
        THEN
            cur_date := row.date;
            RETURN NEXT row;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

SELECT *
FROM f();

row	date
1	1610000001000
4	1610000002000
5	1610000005000