Search code examples
sqlpostgresqlpostgresql-performance

Filter out rows in 1:N LEFT JOIN where any row in child table fails condition


I have two tables, one which I will call D and one which I will call S, with a 1:N relationship between D and S. I currently have the following query that runs on these two:

SELECT d.*, s.*
FROM d
LEFT JOIN s ON s.d_id = d.id
WHERE coalesce(:status, d.status) = d.status

Now, this works well, but I would like to be able to filter on s.block_start and s.block_end, which are date times. That seems straightforward, but my goal is to filter out any row in D that corresponds to a row in S which doesn't fall within the date time bounds I set. My first thought was to join the above with the following query:

SELECT id
FROM (
    SELECT d.id, MIN(s.block_start), MAX(s.block_end)
    FROM d
    LEFT JOIN s ON s.d_id = d.id
    GROUP BY d.id
) WHERE block_start >= :start AND block_end <= :end

but this seems like a very inefficient query for what I'm trying to do. Does anyone know of the best-practice way to achieve this result?


Solution

  • While there aren't huge numbers of rows in table s for each d.id, I expect this query to perform best:

    SELECT *  -- retrieve all columns?
    FROM   d
    LEFT   JOIN s ON s.d_id = d.id
    WHERE  NOT EXISTS (
       SELECT FROM s
       WHERE  s.d_id = d.id
       AND   (s.block_start < :start
           OR s.block_end   > :end)
       )
    -- AND  (d.status = :status OR :status IS NULL)
    

    ... when supported by one multicolumn index like:

    CREATE INDEX s_d_id_start_end ON s (d_id, block_start, block_end);
    

    Your use of LEFT JOIN indicates you don't want to exclude rows from d that have no related rows in s. Pay attention not to void that plan with conditions that demand rows from s after all. (Applies to both answers so far.)

    If the table is vacuum'ed enough, you'll get index-only scans, where Postgres walks all index rows for each d_id, and filters for start & end. It can stop as soon as the first offending row is encountered. So many offending rows are good for the performance of this query.

    The query is versatile in regard to what you want to retrieve eventually.
    And it works well with or without filter on status.

    More optimization is possible, depending on undisclosed details.