Search code examples
mysqlperformancesql-order-by

MySQL: Difference between LIKE 123 and = 123 regarding INDEX usage


I am encountering a strange behavior on a select statement.

Assume the following table structure with some million entries:

CREATE TABLE `obj` (
  `obj__id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `obj__obj_type__id` int(10) unsigned DEFAULT NULL,
  `obj__title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__const` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__description` text COLLATE utf8_unicode_ci,
  `obj__created` datetime DEFAULT NULL,
  `obj__created_by` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__updated` datetime DEFAULT NULL,
  `obj__updated_by` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__property` int(10) unsigned DEFAULT '0',
  `obj__status` int(10) unsigned DEFAULT '1',
  `obj__sysid` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__scantime` datetime DEFAULT NULL,
  `obj__imported` datetime DEFAULT NULL,
  `obj__hostname` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `obj__undeletable` int(1) unsigned NOT NULL DEFAULT '0',
  `obj__rt_cf__id` int(11) unsigned DEFAULT NULL,
  `obj__cmdb_status__id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`obj__id`),
  KEY `obj_FKIndex1` (`obj__obj_type__id`),
  KEY `obj_ibfk_2` (`obj__cmdb_status__id`),
  KEY `obj__sysid` (`obj__sysid`),
  KEY `obj__title` (`obj__title`),
  KEY `obj__const` (`obj__const`),
  KEY `obj__hostname` (`obj__hostname`),
  KEY `obj__status` (`obj__status`),
  KEY `obj__updated_by` (`obj__updated_by`)
) ENGINE=InnoDB AUTO_INCREMENT=7640131 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

A very simple select with two conditions ordering by obj__title with a limit of 500 performs quiet slow (500ms):

SELECT SQL_NO_CACHE * FROM obj WHERE (obj__status = 2) AND (obj__obj_type__id = 59) ORDER BY obj__title ASC LIMIT 0, 500;

Without the "ORDER BY obj__title" it runs like a charm (<1ms).

EXPLAIN SELECT is telling me that MySQL is performing a filesort and not using the obj__title index

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  obj index_merge obj_FKIndex1,obj__status    obj_FKIndex1,obj__status    5,5 NULL    1336    Using intersect(obj_FKIndex1,obj__status); Using where; Using filesort

When i am forcing the index obj__title to use with FORCE or USE INDEX, mysql is not using the other indexes resulting in a very poor performance again. It kinda looks like that the poor performance has something to do with the combination of the two conditions and the order by.

Now that i spend hours on investigating on optimizing this query i came up with a very simple change: I exchanged the operator of my conditions from = to LIKE. So my query is:

EXPLAIN SELECT SQL_NO_CACHE * FROM obj WHERE (obj__status LIKE 2) AND (obj__obj_type__id LIKE 59) ORDER BY obj__title ASC LIMIT 0, 500;

This is what happened..

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  obj index   obj_FKIndex1,obj__status    obj__title  768 NULL    500 Using where

Query performance is a lot faster: 150ms

I am not really happy with the speed but at least it is performing ok.

But what I would really like to know is why LIKE is using the index, and = does not? I did not find any hints on that on the MySQL documentation. Only a few notes on LIKE being case insensitive and LIKE acting a bit different for VARCHARS > 255, or any other CHAR or TEXT fields.. No single word about it's behavior on integer fields.

Can someone shed light on this situation? Any Database design or query tips to speed up the query more are very welcome as well!


Solution

  • For this query:

    SELECT SQL_NO_CACHE *
    FROM obj
    WHERE (obj__status = 2) AND (obj__obj_type__id = 59)
    ORDER BY obj__title ASC
    LIMIT 0, 500;
    

    The best index is obj(obj__status, obj__obj_type__id, obj__title).

    Otherwise, I would expect an index on one of the two where fields.

    However, when you use like, you are comparing numbers to strings. This generally prevents an index from being used. The only possible index is for the order by, which happens to work in your case.

    But, the proper index should have better performance.