Search code examples
postgresqlquery-performancepostgresql-11

PostgreSQL subquery case sequential scan


I have 2 tables:

  1. requests: content 60 million records (Used as a log for the website)

  2. requests_hours: content few hundreds of rows (keep updating it every minute from requests tables)

I have the following simple query , but when I execute it, it takes a about 5 minutes to complete, because Postgres do not use the index of the column request_time_utc and just does a sequential scan.

SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)

enter image description here

but if I just remove the subquery (which, by itself, is executed in 0.003s) and replace it with a static value like in the following, I get the following query executed in only 0.008s:

SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= '2019-09-30 17:00:00'

enter image description here

The query should count just a few rows every minute, from 1000 ~ 7000, so certainly an index scan on the column request_time_utc should be much better than a sequential scan.

I didn't understand how to force PostgreSQL to do an index scan for the first query.

The above queries are for simplifying the issue; here is the original one:

SELECT 
            customer_id,
            DATE_TRUNC('hour', request_time_utc) AS request_hour_utc,
            COUNT(request_id) AS total_requests,
            SUM(data_in_size) AS total_data_in_size,
            SUM(data_out_size) AS total_data_out_size,
            SUM(process_long) AS total_process_long
            FROM requests
            WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)
                        AND customer_id IS NOT NULL
            GROUP BY request_hour_utc , customer_id
            ORDER BY request_hour_utc DESC;

Solution

  • Move your subquery to CTE, like this (I'm writing this from my phone on the train so you will need to land at the correct query :-) ):

    WITH your_max AS (SELECT MAX(request_hour_utc) as foo FROM requests_hours)
    SELECT COUNT(request_id)
    FROM requests CROSS JOIN your_max
    WHERE request_time_utc >= your_max.foo