Search code examples
mysqlsqlquery-optimizationcollationquery-planner

Why does removing the BINARY function call from my SQL query change the query plan so dramatically?


I have a SQL query that looks for a specific value in a table and then does inner joins across three tables to fetch the result set. The three tables are fabric_barcode_oc, fabric_barcode_items & fabric_barcode_rolls

Initial Query

The initial version of the query is below

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE BINARY `roll`.`roll_number` = 'dZkzHJ_je8'

When running EXPLAIN ANALYZE on this, I get the following

"-> Nested loop inner join  (cost=468160.85 rows=582047) (actual time=0.063..254.186 rows=1 loops=1)
    -> Nested loop inner join  (cost=264444.40 rows=582047) (actual time=0.057..254.179 rows=1 loops=1)
        -> Filter: (cast(roll.roll_number as char charset binary) = 'dZkzHJ_je8')  (cost=60727.95 rows=582047) (actual time=0.047..254.169 rows=1 loops=1)
            -> Table scan on roll  (cost=60727.95 rows=582047) (actual time=0.042..198.634 rows=599578 loops=1)
        -> Single-row index lookup on oc using PRIMARY (oc_unique_id=roll.oc_unique_id)  (cost=0.25 rows=1) (actual time=0.009..0.009 rows=1 loops=1)
    -> Single-row index lookup on item using PRIMARY (item_unique_id=roll.item_unique_id_fk)  (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=1)
"

Updated Query

I then changed the query to

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE `roll`.`roll_number` = 'dZkzHJ_je8'

and this generates the following execution plan

"-> Rows fetched before execution  (cost=0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)

The only difference between the two queries is that I removed the BINARY function call from the query. I'm confused by why the plan is so different?

Execution Times

Query 1 had an execution time of ~375ms while the second query had an execution time of ~160ms.

What is causing this difference?

UPDATE

Including the table schema definition for fabric_barcode_rolls as requested

fabric_barcode_rolls,"CREATE TABLE `fabric_barcode_rolls` (
  `roll_unique_id` int NOT NULL AUTO_INCREMENT,
  `oc_unique_id` int NOT NULL,
  `item_unique_id_fk` int NOT NULL,
  `roll_number` char(30) NOT NULL,
  `roll_length` decimal(10,2) DEFAULT '0.00',
  `po_number` char(22) DEFAULT NULL,
  `roll_utilized` decimal(10,2) DEFAULT '0.00',
  `user` char(30) NOT NULL,
  `mir_number` char(22) DEFAULT NULL,
  `mir_location` char(10) DEFAULT NULL,
  `mir_stamp` datetime DEFAULT NULL,
  `creation_stamp` datetime DEFAULT CURRENT_TIMESTAMP,
  `update_stamp` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`roll_unique_id`),
  UNIQUE KEY `roll_number` (`roll_number`),
  KEY `fabric_barcode_item_fk` (`item_unique_id_fk`),
  CONSTRAINT `fabric_barcode_item_fk` FOREIGN KEY (`item_unique_id_fk`) REFERENCES `fabric_barcode_items` (`item_unique_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=610684 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci"

Solution

  • Your performance difference is due to this fact: in MySQL, collations on VARCHAR() and CHAR() columns are baked into the indexes.

    Edit updated to match the table definition.

    Your fabric_barcode_rolls table has a column defined like this:

    roll_number char(30) NOT NULL,
    ...
    UNIQUE KEY roll_number (roll_number).
    

    So, your WHERE ... BINARY roll.roll_number = 'dZkzHJ_je8' filter clause is not sargable: it can't use the index on that column. But WHERE ... roll.roll_number = 'dZkzHJ_je8' is sargable: it does use the index. So it's fast. But the column's default collation is case-insensitive. So, it's fast and wrong.

    That can be fixed.

    Notice there's no collation declaration on the column. That means it's using the table's default: utf8mb4_0900_ai_ci, a case-insensitive collation.

    What you want for an ordinary barcode column is a one-byte-per-character charset and a case-sensitive collation. This would change your table to do that.

     ALTER TABLE fabric_barcode_rolls
    CHANGE  roll_number 
            roll_number CHAR(30) COLLATE latin1_bin NOT NULL;
    

    This is a multilevel win. Using the correct character set for your barcodes saves data. It makes the indexes shorter and more efficient to use. It does case-sensitive (binary-match) lookups, which themselves make indexes shorter and much more efficient to use. And it doesn't run the collision risk between barcodes with upper and lower case character sets.

    Before you conclude that the collision risk is so low you don't have to worry about it, please read about the birthday paradox.