Search code examples
mysqlperformanceindexingbooleantinyint

MySQL index on TINYINT column: why is there a difference between checking '= 0' and checking 'is false'?


MySQL is providing different execution plans for a query depending on whether I check that an indexed TINYINT field 'is false' or '= 0'. The table's name is ordini (means 'shipments') and the index that I want MySQL to use is shipmentslistrequest

SHOW CREATE ordini (I omitted most columns and indexes, for simplicity):

CREATE TABLE `ordini` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dataIns` datetime DEFAULT NULL,
  `hasLDV` tinyint(1) NOT NULL DEFAULT '0',
  `isAnnullato` tinyint(1) NOT NULL DEFAULT '0',
  `isEsportatoSAM` tinyint(1) DEFAULT '0',
  `id_azienda_aux` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `dataIns` (`dataIns`),
  KEY `id_azienda_aux` (`id_azienda_aux`),
  KEY `shipmentslistrequest` (`id_azienda_aux`,`isEsportatoSAM`,
                              `hasLDV`,`isAnnullato`,`dataIns`)
) ENGINE=InnoDB AUTO_INCREMENT=5007359 DEFAULT CHARSET=latin1

Query1:

EXPLAIN  select  *
        from  ordini
        where  id_azienda_aux = 92
          and  isEsportatoSAM = 0
          and  isAnnullato = 0
          and  hasLDV = 1
          and  dataIns >= '2020-04-28'
          and  dataIns <  '2020-05-19';

id|select_type|table |type |possible_keys                              |key                 |key_len|ref|rows|Extra      |
--|-----------|------|-----|-------------------------------------------|--------------------|-------|---|----|-----------|
 1|SIMPLE     |ordini|range|dataIns,id_azienda_aux,shipmentslistrequest|shipmentslistrequest|17     |   | 138|Using where|

Query2 (change: isEsportatoSAM is false):

EXPLAIN   select  *
        from  ordini
        where  id_azienda_aux = 92
          and  isEsportatoSAM is false
          and  isAnnullato = 0
          and  hasLDV = 1
          and  dataIns >= '2020-04-28'
          and  dataIns <  '2020-05-19';

id|select_type|table |type |possible_keys                              |key    |key_len|ref|rows  |Extra      |
--|-----------|------|-----|-------------------------------------------|-------|-------|---|------|-----------|
 1|SIMPLE     |ordini|range|dataIns,id_azienda_aux,shipmentslistrequest|dataIns|9      |   |205920|Using where|

Query3 (change: isEsportatoSAM = 0 and isAnnullato is false):

EXPLAIN     select  *
        from  ordini
        where  id_azienda_aux = 92
          and  isEsportatoSAM = 0
          and  isAnnullato is false
          and  hasLDV = 1
          and  dataIns >= '2020-04-28'
          and  dataIns <  '2020-05-19';

id|select_type|table |type|possible_keys                              |key                 |key_len|ref              |rows|Extra      |
--|-----------|------|----|-------------------------------------------|--------------------|-------|-----------------|----|-----------|
 1|SIMPLE     |ordini|ref |dataIns,id_azienda_aux,shipmentslistrequest|shipmentslistrequest|7      |const,const,const| 206|Using where|

My guess is that:

Query1 will properly take advantage of the shipmentslistrequest index.

Query2 can't use it beyond the first column because I am comparing isEsportatoSAM to false (but why?. Does the fact that the column admits nulls have any relevance?), and hence chooses a different index.

Query3 will use the index up to (and including) the third column, but no further because of the 'is false' comparison, and that's why the three 'const' in the ref column, and that's why the join type is 'ref' instead of 'range' (it doesn't get to the date range part of the index). Is this interpretation correct?

Can anyone please solve these questions for me?

PS: The MySQL version is 5.5.56

PS2: From a design point of view, I am aware that it makes no sense to allow nulls in those tinyint columns (I didn't design this system).


Solution

  • This is just a guess, but I suspect it's because IS boolean_value is designed to convert any datatype to its boolean value before testing it. The query optimizer should be able to tell that IS FALSE is equivalent to = 0 when the column is any kind of INT, and then use the index as in the first query, but apparently MySQL doesn't implement that optimization.

    If you want great query optimization, MySQL isn't generally the DB of choice.