Search code examples
sqlpostgresqlquery-optimization

Will the output of the following two queries be same and why the performance of first one is considerably better than the second one?


Query 1:

select *  
from a
              LEFT JOIN p ON a.product_id = p.web_id
              LEFT JOIN m ON p.manufacturer_id = m.web_id
              LEFT JOIN a add ON a.address_id = add.web_id
              LEFT JOIN c ON c.web_id = a.owner_id
              LEFT JOIN a ca ON c.main_address_id = ca.web_id
              LEFT JOIN d ON a.web_id = d.equipment_id
              LEFT JOIN s ON d.last_segment_id = s.web_id
              LEFT JOIN oh ON s.order_header_id = oh.web_id
              LEFT JOIN f ON s.folder_id = f.web_id
              LEFT JOIN ve on a.web_id = ve.equipment_id
where ve.date_updated >= now() - interval '10000 min'

Query 2:

select *  
from a
              LEFT JOIN p ON a.product_id = p.web_id
              LEFT JOIN m ON p.manufacturer_id = m.web_id
              LEFT JOIN ve on a.web_id = ve.equipment_id
              LEFT JOIN a add ON a.address_id = add.web_id
              LEFT JOIN c ON c.web_id = a.owner_id
              LEFT JOIN a ca ON c.main_address_id = ca.web_id
              LEFT JOIN d ON a.web_id = d.equipment_id
              LEFT JOIN s ON d.last_segment_id = s.web_id
              LEFT JOIN oh ON s.order_header_id = oh.web_id
              LEFT JOIN f ON s.folder_id = f.web_id
where ve.date_updated >= now() - interval '10000 min'

The only difference between the two queries is the position of the left join with ve table.

Am I right to believe that the output of both the queries will always be the same?

The query 1 is able to execute in under 5 seconds while the query 2 goes on for about 4mins. The total number of records in table a is around 20million. Is there a reason why query 1 is faster?


Solution

  • The order of the join conditions does not matter, and the queries are semantically identical. The difference in execution time must be connected to join_collapse_limit:

    The planner will rewrite explicit JOIN constructs (except FULL JOINs) into lists of FROM items whenever a list of no more than this many items would result. Smaller values reduce planning time but might yield inferior query plans.

    By default, this variable is set the same as from_collapse_limit, which is appropriate for most uses. Setting it to 1 prevents any reordering of explicit JOINs. Thus, the explicit join order specified in the query will be the actual order in which the relations are joined. Because the query planner does not always choose the optimal join order, advanced users can elect to temporarily set this variable to 1, and then specify the join order they desire explicitly. For more information see Section 14.3.

    Note that the query joins more than eight tables.

    If you have many joins like that, and you don't want to rewrite them, and you don't mind the optimizer to take more time, you can increase join_collapse_limit and from_collapse_limit.