Search code examples
sqlpostgresqlsubquerycorrelated-subquery

Correlated Subquery not having expected behavior


I have this SQL query in Postgres and I am having an unexpected behavior with it:

SELECT
  DISTINCT s.id,
  (
    SELECT string_agg(CAST(t_code AS TEXT), ',') 
    FROM (
      SELECT DISTINCT ps.status
      FROM products_status ps
      WHERE
        ps.status_transition_message_id IS NOT NULL AND 
        ps.enabled IS FALSE AND
        ps.status_transition_message_id = stm_rejected.id
    ) AS t_code
  ) AS t_code
FROM (
  SELECT
    *,
     row_number() OVER (PARTITION BY stm.shipment_id ORDER BY date) AS rn 
  FROM status_transition_message stm
  WHERE
    stm.final_status = 6 OR
    stm.final_status = 7
) AS stm_rejected
JOIN shipment s ON s.id = stm_rejected.shipment_id
WHERE
  stm_rejected.rn = 1 AND
  stm_rejected.date BETWEEN :startDate AND :endDate;

The status_transition_message table represents status transitions between X and Y for a given shipment. With this query, I am going through all shipments and getting those which had their 1st status transition to 6 or 7 (they can go through the same transition multiple times) between a given start and end date. For the shipments that respect this condition, I am doing a SELECT for the t_code field of products_status specified in that status transition.

The problem with this query is that in the t_code subquery, even when ps.status_transition_message_id IS NOT NULL evaluates to false, Postgres is still evaluating ps.status_transition_message_id = stm_rejected.id, even though it comes after an AND boolean operator. I know this is happening because when i remove the ps.status_transition_message_id = stm_rejected.id part from the query, it executes much faster. And I am also 100% sure that every products_status in the database that i am testing with has the status_transition_message_id as NULL.

EDIT: Here's how a status_transition_message row looks like:

[id] [date]         [initial_status]  [final_status]    [shipment_id]

434  3/20/13 14:18   0                4                  943

and a products_status row:

[id] [status]  [status_transition_message_id]  [enabled]    [shipment_id]

211  5         434                             true          943

The status_transition_message_id in the products_status table is a foreign key to the status_transition_message table, and this field can be null, which is why in the first clause of the WHERE statement of the subquery I am checking whether it is NULL or not (so it doesn't have to be tested against stm_rejected rows without need)

I don't know if it was clear in my question, but the query does the return the expected results. The problem is that it is unnecessarily evaluating AND clauses when the first one already evaluates to false, which is harming the query's performance.


Solution

  • Your hunch is correct: boolean expressions are not evaluated left-to-right. The expression evaluation rules (4.2.14) in the postgres docs state:

    The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.

    To force evaluation order you can use a common table expression (CTE), which will also make your query more readable. It tells the optimizer to not rewrite the expression, but materialize the results instead, kind of like a temp table. It's hard to tell without running and explaining the query on your actual data whether it will result in performance improvement or degradation. I would try out both sub-queries and CTEs.

    In your particular case, however, there may not be a need for a correlated sub-query. I've rewritten the query to use an inner join, which should handle that logic more efficiently. I'm also using a CTE here, but for a different purpose, because my guess is that the transition to status 6 and 7 are a tiny subset of all transitions, and so performance may benefit from that early reduction of rows.

    I also went out on a limb here and changed the logic to replace the distinct with an explicit grouping.

    with
    
      stm_rejected as (
        select
          id,
          "date" as transition_date,
          row_number() over (partition by shipment_id order by "date") as transition_rank
        from
          status_transition_message
        where
          final_status in (6, 7)
      )
    
    select
      shipment.id as shipment_id,
      string_agg(products_status.t_code, ',') as t_codes
    from
      shipment
      inner join stm_rejected
        on shipment.id = stm_rejected.shipment_id
      inner join products_status
        on stm_rejected.id = products_status.status_transition_message_id
    where
      stm_rejected.transition_rank = 1
      and stm_rejected.transition_date between :startDate and :endDate
      and products_status.enabled = false
    group by
      shipment.id
    ;