Search code examples
mysqlwhere-inmysql-5.6

Why 0 results on "where not in" query


mysql> show columns in m like 'fld';
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| fld   | varchar(45) | YES  | MUL | NULL    |       |
+-------+-------------+------+-----+---------+-------+
1 row in set (0.02 sec)

mysql> show columns in i like 'fld';
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| fld   | varchar(45) | YES  | MUL | NULL    |       |
+-------+-------------+------+-----+---------+-------+
1 row in set (0.02 sec)

mysql> select count(distinct fld) from i;
+---------------------+
| count(distinct fld) |
+---------------------+
|               27988 |
+---------------------+
1 row in set (0.03 sec)

mysql> select count(distinct fld) from m;
+---------------------+
| count(distinct fld) |
+---------------------+
|               72558 |
+---------------------+
1 row in set (0.07 sec)

The above results seem reasonable based on what I know of the tables in question.

mysql> select count(*) from m where fld not in (select fld from i);
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.11 sec)

That last result does not seem reasonable. There must be some rows in m with fld not in i! Can someone please explain why I get 0 as the result?


For completeness (because I suspect it may be relevant), I'll also paste this result:

mysql> select count(*) from m where fld is null;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Edit: In reply to comments, I'm editing in the following info also, in case it helps someone answer my question:

  • select count(*) from m join i using (fld) yields 9350; with left join, 73087; with right join, 28872.
  • select count(*) from i where fld is null yields 810.

Solution

  • I've now figured it out. The documentation reads:

    To comply with the SQL standard, IN returns NULL not only if the expression on the left hand side is NULL, but also if no match is found in the list and one of the expressions in the list is NULL.

    Thus, fld not in (i.e. not fld in) returns not null (i.e. null) whenever there's a null in the derived table (which there is here) and fld isn't found in the derived table (which is what I'm testing for).


    There may be a better workaround than this, but I'm using (select fld from i where fld is not null).