Search code examples
mysqlleft-joinself-joinanti-join

Can't understand the mysql self left-join query


I come across a query which is,

SELECT it1.survey_set_id, it1.type, it1.value FROM survey_condition_filter it1 LEFT JOIN survey_condition_filter it2 ON(it1.survey_set_id = it2.survey_set_id AND it2.type = 3002) WHERE it1.type IN (2000, 2001, 2002) AND it2.value IS NULL;

Why is self left-join is used in the above query.

SELECT survey_set_id, type, value FROM survey_condition_filter WHERE type IN (2000, 2001, 2002);

isn't the above query is equivalent to the first query which used self left-join. since the query is just also filtering the IN (2000, 2001, 2002) AND it2.value IS NULL. I am confused the use of join query here and can't really understand the working of the first query.

This is table survey_condition_filter

+---------------+--------------+------+-----+---------+-------+
| Field         | Type         | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| survey_id     | bigint       | NO   | PRI | NULL    |       |
| survey_set_id | bigint       | NO   | PRI | NULL    |       |
| type          | int          | NO   | PRI | NULL    |       |
| condition     | tinyint      | NO   | PRI | NULL    |       |
| value         | varchar(15)  | NO   | PRI | NULL    |       |
| display_value | text         | NO   |     | NULL    |       |
| order         | int          | YES  |     | NULL    |       |
| created_at    | datetime     | NO   |     | NULL    |       |
| created_by    | varchar(255) | YES  |     | NULL    |       |
| updated_at    | datetime     | NO   |     | NULL    |       |
| updated_by    | varchar(255) | YES  |     | NULL    |       |
| deleted_at    | datetime     | YES  |     | NULL    |       |
| deleted_by    | varchar(255) | YES  |     | NULL    |       |
+---------------+--------------+------+-----+---------+-------+

Solution

  • This is an anti join. We outer join a table, but then only keep the outer joined rows (by applying WHERE ... it2.value IS NULL here). It is much more common - and much more readable in my opinion - to use a straight-forward NOT EXISTS (or NOT IN in many situations).

    SELECT it1.survey_set_id, it1.type, it1.value
    FROM survey_condition_filter it1 
    WHERE it1.type IN (2000, 2001, 2002) 
    AND NOT EXISTS
    (
      SELECT null
      FROM survey_condition_filter it2 
      WHERE it2.survey_set_id = it1.survey_set_id
      AND it2.type = 3002
    );
    

    What this query does is obvious: select all type 2000/2001/2002 rows where not exists a 3002 row for the same survey_set_id .

    If survey_set_id is a non-nullable column, you can use NOT IN, which gets the query even a tad shorter:

    SELECT it1.survey_set_id, it1.type, it1.value
    FROM survey_condition_filter it1 
    WHERE it1.type IN (2000, 2001, 2002) 
    AND itl.survey_set_id NOT IN
    (
      SELECT it2.survey_set_id
      FROM survey_condition_filter it2 
      WHERE it2.type = 3002
    );