Search code examples
sqlpostgresqlaws-aurora-serverless

General question on optimising a database for a large query


I have a database that stores data from sensors in a factory. The DB contains about 1.6 million rows per sensor per day. I have the following index on the DB.

CREATE INDEX sensor_name_time_stamp_index ON sensor_data (time_stamp, sensor_name);

I will be running the following query once per day.

SELECT
    time_stamp, value
FROM
    (SELECT
        time_stamp,
        value,
        lead(value) OVER (ORDER BY value DESC) as prev_result
     FROM
        sensor_data WHERE time_stamp between '2021-02-24' and '2021-02-25' and sensor_name = 'sensor8'
     ORDER BY
        time_stamp DESC) as result
WHERE
    result.value IS DISTINCT FROM result.prev_result
ORDER BY
    result.time_stamp DESC;

The query returns a list of rows where the value is different from the previous row.

This query takes about 23 seconds to run.

Running on PostgreSQL 10.12 on Aurora serverless.

Questions: Besides the index, are there any other optimisations that I can perform on the DB to make the query run faster?


Solution

  • To support the query optimally, the index must be defined the other way around:

    CREATE INDEX ON sensor_data (sensor_name, time_stamp);
    

    Otherwise, PostgreSQL will have to read all index values in the time interval, then discard the ones for the wrong sensor, then fetch the rows from the table.

    With the proper column order, only the required rows are scanned in the index.

    You asked for other optimizations: Since you have to sort rows, increasing work_mem can be beneficial. Other than that, more memory and faster storage will definitely not harm.