Search code examples
mysqlperformanceinnodb

How do I improve performance on a MySQL query with NULL?


I have several million records in the following table:

CREATE TABLE `customers` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `store_id` int(10) unsigned DEFAULT NULL,
  `first_name` varchar(64) DEFAULT NULL,
  `middle_name` varchar(64) DEFAULT NULL,
  `last_name` varchar(64) DEFAULT NULL,
  `email` varchar(128) DEFAULT NULL,
  `phone` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_store_email` (`store_id`,`email`),
  KEY `index_store_phone` (`store_id`,`phone`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

Query #1 takes ~800ms:
SELECT COUNT(*) FROM `customers` WHERE `store_id` = 1;

Query #2 takes ~1.5ms:
SELECT COUNT(*) FROM `customers` WHERE `store_id` = 1 AND `email` IS NULL;

Query #3 takes a whopping 5 seconds:
SELECT COUNT(*) FROM `customers` WHERE `store_id` = 1 AND `email` IS NOT NULL;

Notes:

  • I've simplified the table to ask the question, but the query is identical.
  • Yes, my table is optimized.
  • Yes, both fields are indexed, see the create syntax above.
  • There are only a few store_ids, but every record has one.
  • There are very few customers with email set to null.

I find a few things strange here:

  1. Query #1 is simplest! There are only a few possible INT values. Shouldn't it be fastest?
  2. Why is Query #3 so slow? I could cut the time in half by doing the other two queries, and subtracting #1 from #2, but I shouldn't have to.

Any thoughts on this seemingly basic question? Feel like I'm missing something simple. Did I sleep through a class in db school?


Solution

  • At times the MySQL query parser guesses wrong when it decides which indices to use. For cases like these the index hints can be useful (http://dev.mysql.com/doc/refman/5.7/en/index-hints.html)

    To force the use of an index:

    SELECT * FROM table1 USE INDEX (col1_index,col2_index)
      WHERE col1=1 AND col2=2 AND col3=3;
    

    To force the use of an index including replacing table scans:

    SELECT * FROM table1 FORCE INDEX (col1_index,col2_index)
      WHERE col1=1 AND col2=2 AND col3=3;
    

    To ignore a certain index:

    SELECT * FROM table1 IGNORE INDEX (col3_index)
      WHERE col1=1 AND col2=2 AND col3=3;
    

    To debug which index is being used the EXPLAIN statement can be used: (https://dev.mysql.com/doc/refman/5.7/en/explain-output.html)

    EXPLAIN SELECT * FROM table1
      WHERE col1=1 AND col2=2 AND col3=3;