I have this SQL query in Postgres and I am having an unexpected behavior with it:
SELECT
DISTINCT s.id,
(
SELECT string_agg(CAST(t_code AS TEXT), ',')
FROM (
SELECT DISTINCT ps.status
FROM products_status ps
WHERE
ps.status_transition_message_id IS NOT NULL AND
ps.enabled IS FALSE AND
ps.status_transition_message_id = stm_rejected.id
) AS t_code
) AS t_code
FROM (
SELECT
*,
row_number() OVER (PARTITION BY stm.shipment_id ORDER BY date) AS rn
FROM status_transition_message stm
WHERE
stm.final_status = 6 OR
stm.final_status = 7
) AS stm_rejected
JOIN shipment s ON s.id = stm_rejected.shipment_id
WHERE
stm_rejected.rn = 1 AND
stm_rejected.date BETWEEN :startDate AND :endDate;
The status_transition_message
table represents status transitions between X and Y for a given shipment. With this query, I am going through all shipments and getting those which had their 1st status transition to 6 or 7 (they can go through the same transition multiple times) between a given start and end date. For the shipments that respect this condition, I am doing a SELECT
for the t_code
field of products_status
specified in that status transition.
The problem with this query is that in the t_code
subquery, even when ps.status_transition_message_id IS NOT NULL
evaluates to false, Postgres is still evaluating ps.status_transition_message_id = stm_rejected.id
, even though it comes after an AND boolean operator. I know this is happening because when i remove the ps.status_transition_message_id = stm_rejected.id
part from the query, it executes much faster. And I am also 100% sure that every products_status
in the database that i am testing with has the status_transition_message_id
as NULL.
EDIT: Here's how a status_transition_message
row looks like:
[id] [date] [initial_status] [final_status] [shipment_id]
434 3/20/13 14:18 0 4 943
and a products_status
row:
[id] [status] [status_transition_message_id] [enabled] [shipment_id]
211 5 434 true 943
The status_transition_message_id
in the products_status
table is a foreign key to the status_transition_message
table, and this field can be null, which is why in the first clause of the WHERE
statement of the subquery I am checking whether it is NULL or not (so it doesn't have to be tested against stm_rejected
rows without need)
I don't know if it was clear in my question, but the query does the return the expected results. The problem is that it is unnecessarily evaluating AND clauses when the first one already evaluates to false, which is harming the query's performance.
Your hunch is correct: boolean expressions are not evaluated left-to-right. The expression evaluation rules (4.2.14) in the postgres docs state:
The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
To force evaluation order you can use a common table expression (CTE), which will also make your query more readable. It tells the optimizer to not rewrite the expression, but materialize the results instead, kind of like a temp table. It's hard to tell without running and explaining the query on your actual data whether it will result in performance improvement or degradation. I would try out both sub-queries and CTEs.
In your particular case, however, there may not be a need for a correlated sub-query. I've rewritten the query to use an inner join, which should handle that logic more efficiently. I'm also using a CTE here, but for a different purpose, because my guess is that the transition to status 6 and 7 are a tiny subset of all transitions, and so performance may benefit from that early reduction of rows.
I also went out on a limb here and changed the logic to replace the distinct with an explicit grouping.
with
stm_rejected as (
select
id,
"date" as transition_date,
row_number() over (partition by shipment_id order by "date") as transition_rank
from
status_transition_message
where
final_status in (6, 7)
)
select
shipment.id as shipment_id,
string_agg(products_status.t_code, ',') as t_codes
from
shipment
inner join stm_rejected
on shipment.id = stm_rejected.shipment_id
inner join products_status
on stm_rejected.id = products_status.status_transition_message_id
where
stm_rejected.transition_rank = 1
and stm_rejected.transition_date between :startDate and :endDate
and products_status.enabled = false
group by
shipment.id
;