Search code examples
mysqlleft-join

LEFT JOIN gives different result when changing the location of condition


I have this query:

SELECT
    bc.name,
    SUM(amount) AS total
FROM
    budget_categories bc
LEFT JOIN budget_items bi ON
    bc.id = bi.budget_category_id
LEFT JOIN balance_lines bl ON
    bl.budget_item_id = bi.id
WHERE
    bc.id = 25 AND family_id = 97109 AND bl.balance_id = 264553

The result is:

NULL | NULL

But when I put the condition in the LEFT JOIN part it works:

SELECT
    bc.name,
    SUM(amount) AS total
FROM
    budget_categories bc
LEFT JOIN budget_items bi ON
    bc.id = bi.budget_category_id
LEFT JOIN balance_lines bl ON
    bl.budget_item_id = bi.id  AND family_id = 97109 AND bl.balance_id = 264553
WHERE
    bc.id = 25

The result is:

Bannana | NULL

Why is the difference?


Solution

  • When you use an outer join (e.g. left outer join) you are asking for the query to return rows even if that table does not have an exact match. Hence there may be NULL values from the left outer joined columns.

    In the first where clause you ask for EVERY row in the overall result row to have bl.family_id = 97109 AND bl.balance_id = 264553 but here all the returned rows must have NULL bc.name and NULL amount. (btw: This is sometimes known as an implied inner join and in effect is equivalent to the query below:)

    SELECT
        bc.name,
        SUM(amount) AS total
    FROM
        budget_categories bc
    LEFT JOIN budget_items bi ON
        bc.id = bi.budget_category_id
    INNER JOIN balance_lines bl ON -- the where clause has this effect
        bl.budget_item_id = bi.id
    WHERE
        bc.id = 25 AND bl.family_id = 97109 AND bl.balance_id = 264553
    

    In the second query you allow only the rows of balance_lines to be restricted by the join conditions, so that more rows can be retuned from the other tables - and in those additional rows you get a bc.name of banana.

    When using left joins you must take care with any predicates that reference outer joined tables if you are to allow NULLs to be returned e.g.

    SELECT
        bc.name,
        SUM(amount) AS total
    FROM
        budget_categories bc
    LEFT JOIN budget_items bi ON
        bc.id = bi.budget_category_id
    LEFT JOIN balance_lines bl ON
        bl.budget_item_id = bi.id
    WHERE
        bc.id = 25 
        AND (bl.family_id = 97109 OR bl.family_id IS NULL)
        AND (bl.balance_id = 264553 OR bl.balance_id IS NULL)
    

    This revised where clause will behave just like your second query now.