Search code examples
mysqlsqlperformancequery-optimizationmysql-cluster

Like and '=' equals to operator performance in sql - mysql cluster


MySQL-Cluster-gpl-7.4.12-1.el6.x86_64

Query 1

    select
    userinfo.username,
    userinfo.firstname,
    userinfo.lastname,
    userinfo.email,
    radcheck.attribute,
    radcheck.`value`,
    radusergroup.groupname,
    userinfo.id,
    userinfo.account_type,
    userinfo.workphone,
    userinfo.homephone,
    userinfo.mobilephone,
    userinfo.address,
    userinfo.zone,
    userinfo.account_state,
    userinfo.link_type,
    userinfo.device_owner,
    userinfo.email
    FROM
    userinfo
    INNER JOIN radcheck ON userinfo.username = radcheck.username
    INNER JOIN radusergroup ON userinfo.username = radusergroup.username
    WHERE
    radcheck.attribute='Expiration'  AND userinfo.mobilephone LIKE '9876543210%'

This query takes about 8s to complete Below is the explain output for query

    +----+-------------+--------------+------+----------------------+-----------+---------+----------------------------+------+-------------+
    | id | select_type | table        | type | possible_keys        | key       | key_len | ref                        | rows | Extra       |
    +----+-------------+--------------+------+----------------------+-----------+---------+----------------------------+------+-------------+
    |  1 | SIMPLE      | radcheck     | ref  | username,attribute   | attribute | 34      | const                      |    9 | Using where |
    |  1 | SIMPLE      | userinfo     | ref  | username,mobilephone | username  | 131     | ctradius.radcheck.username |   13 | Using where |
    |  1 | SIMPLE      | radusergroup | ref  | username             | username  | 66      | ctradius.userinfo.username |   10 | Using where |
    +----+-------------+--------------+------+----------------------+-----------+---------+----------------------------+------+-------------+

Query 2

Below is the query that is finished in < 0.005s

    select
    userinfo.username,
    userinfo.firstname,
    userinfo.lastname,
    userinfo.email,
    radcheck.attribute,
    radcheck.`value`,
    radusergroup.groupname,
    userinfo.id,
    userinfo.account_type,
    userinfo.workphone,
    userinfo.homephone,
    userinfo.mobilephone,
    userinfo.address,
    userinfo.zone,
    userinfo.account_state,
    userinfo.link_type,
    userinfo.device_owner,
    userinfo.email
    FROM
    userinfo
    INNER JOIN radcheck ON userinfo.username = radcheck.username
    INNER JOIN radusergroup ON userinfo.username = radusergroup.username
    WHERE
    radcheck.attribute like 'Expiration%'  AND userinfo.mobilephone LIKE '9876543210%'

Below is the explain out put for the query

 +----+-------------+--------------+-------+----------------------+-------------+---------+----------------------------+------+------------------------+
    | id | select_type | table        | type  | possible_keys        | key         | key_len | ref                        | rows | Extra                  |
    +----+-------------+--------------+-------+----------------------+-------------+---------+----------------------------+------+------------------------+
    |  1 | SIMPLE      | userinfo     | range | username,mobilephone | mobilephone | 203     | NULL                       |  585 | Using where; Using MRR |
    |  1 | SIMPLE      | radusergroup | ref   | username             | username    | 66      | ctradius.userinfo.username |   10 | Using where            |
    |  1 | SIMPLE      | radcheck     | ref   | username,attribute   | username    | 66      | ctradius.userinfo.username |   17 | Using where            |
    +----+-------------+--------------+-------+----------------------+-------------+---------+----------------------------+------+------------------------+

Question

Why is the query execution faster when like operator is used (2nd query) instead of the = (1st query)?


Solution

  • You have two completely different execution plans.

    For your first query, MySQL looks up all entries with radcheck.attribute='Expiration' through the index on radcheck.attribute (and assumes there are 9 rows that fit). It will then join the other tables using the username (and the indexes on username) for each pssible username, then reads the value of userinfo.mobilephone from the table and sees if it fits.

    For your second query, it will check the index on userinfo.mobilephone for anything that begins with 9876543210 (assuming it will find 585 rows). It will then join the other tables using the username (and the indexes on username) for all usernames with the correct mobilephone, then reads the value of radcheck.attribute from the table and sees if it fits.

    Your second query will of course be a lot faster when you actually only have a handful of rows that start with your mobile number, but a lot of rows with radcheck.attribute='Expiration', because it will have to do the rest of the execution (joining, and escpecially reading from the table) with a lot less rows (though you have to have a lot more rows than displayed in your explain to justify 8 seconds).

    MySQL has to guess which way is faster and chooses a way based on your query and some statistical data about your tables. It chose completely wrong.

    In your first query, MySQL assumed that looking up = in an index is better than looking up like in an index (and would yield just 9 rows, which was obviously not correct). In your second query, it just had to choose if looking up like in the first or in the second index is better - and guessed correctly. But if you would e.g. have looked for a userinfo.mobilephone LIKE '0%', it could have been slower (depending on your data).

    There are some things you can try:

    • use optimize table userinfo, radcheck, radusergroup. This will recreate your indexes and statistics. Depending on your data, it might no longer assume just 9 rows for your first query.
    • force mysql to always join in the order of your 2nd query with STRAIGHT_JOIN:

      ...
      from radcheck
      straight_join userinfo ON userinfo.username = radcheck.username
      straight_join radusergroup ON userinfo.username = radusergroup.username
      where ...
      

      But this might result in slower queries in cases where you are looking for e.g. userinfo.mobilephone LIKE '0%', or no condition on userinfo.mobilephone at all, so test it under different circumstances.

    • Keep using like 'Expiration%', you can even switch it to = in cases where you have no or just a short mobilephone to look for, though it is not guaranteed to always use the second way then (and might change with other mysql versions).

    By the way, you might get some additional performance if you would replace your username-columns by an (integer) id, make it (the first part of) the primary key of your three tables, and use this id to join your tables, because your indexes would get a lot smaller, and you e.g. wouldn't have to look up the username after you found values for Expiration. This should reduce your execution time for your first query (based on some assumption about your tables and data, I would guess by more than 50% - but it's just a guess, and it can be just 5% too). But since it doesn't justify changing your whole application for that (although other queries would profit from that too), you might consider it when you have to do some big changes in your code anyway. (You can simulate some part of it and try if it might be worth the effort by adding the index radcheck(attribute, username), this should give you 30%-50% for your first query - assuming username is not your primary key already)