Search code examples
mysqlsqlindexingdatabase-administration

How to Optimize MYSQL in Extra :-Using where; Using temporary; Using filesort


What is the proper indexing for this query.

I tried given different combinations of indexes for this query but it is still using from using tempory , using filesort etc.

Total table data - 7,60,346

product= 'Dresses' - Total rows = 122 554

CREATE TABLE IF NOT EXISTS `product_data` (
  `table_id` int(11) NOT NULL AUTO_INCREMENT,
  `id` int(11) NOT NULL,
  `price` int(11) NOT NULL,
  `store` varchar(255) NOT NULL,
  `brand` varchar(255) DEFAULT NULL,
  `product` varchar(255) NOT NULL,
  `model` varchar(255) NOT NULL,
  `size` varchar(50) NOT NULL,
  `discount` varchar(255) NOT NULL,
  `gender_id` int(11) NOT NULL,
  `availability` int(11) NOT NULL,
  PRIMARY KEY (`table_id`),
  UNIQUE KEY `table_id` (`table_id`),
  KEY `id` (`id`),
  KEY `discount` (`discount`),
  KEY `step_one` (`product`,`availability`),
  KEY `step_two` (`product`,`availability`,`brand`,`store`),
  KEY `step_three` (`product`,`availability`,`brand`,`store`,`id`),
  KEY `step_four` (`brand`,`store`),
  KEY `step_five` (`brand`,`store`,`id`)
) ENGINE=InnoDB ;

Query :

SELECT id ,store,brand FROM `product_data` WHERE product='dresses' and 
availability='1' group by brand,store order by store limit 10;

excu..time :- (10 total, Query took 1.0941 sec)

EXPLAIN PLAN :


possible_keys :- step_one, step_two, step_three, step_four, step_five

key :- step_two

ref :- const,const

rows :- 229438

Extra :-Using where; Using temporary; Using filesort

I tried these indexes


Key step_one (product,availability)

Key step_two (product,availability,brand,store)

Key step_three (product,availability,brand,store,id)

Key step_four (brand,store)

Key step_five (brand,store,id)


Solution

  • The real problem is not the index, but the mismatch between GROUP BY and ORDER BY preventing taking advantage of LIMIT.

    This

    INDEX(product, availability, store, brand, id)
    

    will be "covering" and in the right order. But note that I have swapped store and brand...

    Change the query to

    SELECT  id ,store,brand
        FROM  `product_data`
        WHERE  product='dresses'
          and  availability='1'
        GROUP BY store, brand    -- change
        ORDER BY store, brand    -- change
        limit  10;
    

    That changes the GROUP BY to start with store, to reflect the ORDER BY ordering -- this avoid an extra sort. And it changes the ORDER BY to be identical to the GROUP BY so that the two can be combined.

    Given those changes, the INDEX can now go all the way through to the LIMIT, thereby allowing the processing to look at only 10 rows, not a much larger set.

    Anything less than all these changes will not be as efficient.

    Further discussion:

    INDEX(product, availability,   -- these two can be in either order
          store, brand,      -- must match both `GROUP BY` and `ORDER BY`
          id)   -- tacked on (on the end) to make it "covering"
    

    "Covering" means that all the columns for the SELECT are found in the INDEX, so no need to reach over into the data.

    But... The whole query does not make sense because of the inclusion of id in the SELECT. If you want to find what stores have available dresses, then get rid of id. If you want to list all the available dresses, then change id to GROUP_CONCAT(id).