Search code examples
mysqlleft-joinmariadbmysql-5.6mysql-5.7

MySQL JOIN not filtering on WHERE clause with < > operators, since moving from MySQL 5.6 -> 5.7


We're upgrading our DB systems to MySQL 5.7 coming from MySQL 5.6 and since the upgrade a few queries have been running really slow.

After some investigating we narrowed it down to a few JOIN queries which suddenly don't listen to the 'WHERE' clause anymore when using a 'larger than' > or 'smaller than' < operator. When using a '=' operator it does work as expected. When querying a large table this caused a constant 100% CPU usage.

The queries have been simplified to explain the issue at hand; when using explain we get the following outputs:

explain 
        select * from TableA as A
                left join
                (
                    select
                        DATE_FORMAT(created_at,'%H:%i:00') as `time`
                    FROM
                        TableB
                    WHERE
                        created_at < DATE_ADD(CURDATE(), INTERVAL -3 HOUR) 
                )
                as V ON V.time = A.time

Output

id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   SIMPLE  A   NULL    ALL NULL    NULL    NULL    NULL    10080   100.00  NULL
1   SIMPLE  TableB  NULL    index   created_at  created_at  4   NULL    488389  100.00  Using where; Using index; Using join buffer (Block Nested Loop)

As you can see, it's querying/matching 488389 rows and not using the where clause since this is the total records in that table.

And now running the same query but with a LIMIT 99999999 command or using the '=' operator:

explain 
        select * from TableA as A
                left join
                (
                    select
                        DATE_FORMAT(created_at,'%H:%i:00') as `time`
                    FROM
                        TableB
                    WHERE
                        created_at < DATE_ADD(CURDATE(), INTERVAL -3 HOUR) LIMIT 999999999
                )
                as V ON V.time = A.time

Output

id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   PRIMARY A   NULL    ALL NULL    NULL    NULL    NULL    10080   100.00  NULL
1   PRIMARY <derived2>  NULL    ALL NULL    NULL    NULL    NULL    244194  100.00  Using where; Using join buffer (Block Nested Loop)
2   DERIVED TableB  NULL    range   created_at  created_at  4   NULL    244194  100.00  Using where; Using index

You can see it's suddenly only matching '244194' rows which is a part of the table, or with the '=' operator:

id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   SIMPLE  A   NULL    ALL NULL    NULL    NULL    NULL    10080   100.00  NULL
1   SIMPLE  TableB  NULL    ref created_at  created_at  4   const   1   100.00  Using where; Using index

Just 1 row, as expected.

So the question now is, have we been querying in a wrong way and just now finding out while upgrading or have things changed since MySQL 5.6? It seems odd that the = operator works, but the < and > are ignored for some reason, unless when using a LIMIT?..

We've searched around and couldn't find the cause of this issue, and we'd rather not use the limit 9999999 solution in our code for obvious reasons.

Note When running just the query inside the join, it works as expected as well.

Note We've also ran the same test on MariaDB 10.1, same issue.


Solution

  • In MySQL 5.7, derived tables (sub-queries in FROM clause) will be merged into the outer query if possible. This is usually an advantage since one avoids that the result of the sub-query is stored in a temporary table. However, for your query, MySQL 5.6 will create an index on this temporary table that could be used for the join execution.

    The problem with the merged query is that the index on TableB.created_at can not be used when the column is a parameter to a function. If you can change the query so that the transformation is made to the column on the left side of the join, an index can be used to access the table on the right side. Something like:

       select * from TableA as A
                left join
                (
                    select created_at as time
                    FROM TableB
                    WHERE created_at < DATE_ADD(CURDATE(), INTERVAL -3 HOUR) 
                )
                as V ON V.time = func(A.time)
    

    Alternatively, if you can use inner join instead of left join, MySQL can reverse the join order, so that the index on tableA.time can be used for the join.

    If the subquery use LIMIT, it can not be merged. Hence, by using LIMIT you will get the same query plan as was used in MySQL 5.6.