Search code examples
mysqlsqlquery-optimization

How to optimize MySQL select query or make it faster


I have a select query, that selects over 50k records from MySQL 5.5 database at once, and this amount is expected to grow. The query contains multiple subquery which is taking over 120s to execute.

Initially some of the sale_items and stock tables didn't have more that the ID keys, so I added some more:

SELECT
  `p`.`id` AS `id`,
  `p`.`Name` AS `Name`,
  `p`.`Created` AS `Created`,
  `p`.`Image` AS `Image`,
  `s`.`company` AS `supplier`,
  `s`.`ID` AS `supplier_id`,
  `c`.`name` AS `category`,
  IFNULL((SELECT
    SUM(`stocks`.`Total_Quantity`)
  FROM `stocks`
  WHERE (`stocks`.`Product_ID` = `p`.`id`)), 0) AS `total_qty`,
  IFNULL((SELECT
    SUM(`sale_items`.`quantity`)
  FROM `sale_items`
  WHERE (`sale_items`.`product_id` = `p`.`id`)), 0) AS `total_sold`,
  IFNULL((SELECT
    SUM(`sale_items`.`quantity`)
  FROM `sale_items`
  WHERE ((`sale_items`.`product_id` = `p`.`id`) AND `sale_items`.`Sale_ID` IN (SELECT
    `refunds`.`Sale_ID`
  FROM `refunds`))), 0) AS `total_refund`
  FROM ((`products` `p`
    LEFT JOIN `cats` `c`
      ON ((`c`.`ID` = `p`.`cat_id`)))
    LEFT JOIN `suppliers` `s`
      ON ((`s`.`ID` = `p`.`supplier_id`)))

This is the explain result

+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
| id | select_type        | table      | type           | possible_keys          | key                    | key_len | ref     | rows  | Extra                    |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
|  1 | PRIMARY            | <derived2> | ALL            | NULL                   | NULL                   | NULL    | NULL      | 20981 |                          |
|  2 | DERIVED            | p          | ALL            | NULL                   | NULL                   | NULL    | NULL      | 20934 |                          |
|  2 | DERIVED            | c          | eq_ref         | PRIMARY                | PRIMARY                | 4       | p.cat_id      |     1 |                          |
|  2 | DERIVED            | s          | eq_ref         | PRIMARY                | PRIMARY                | 4       | p.supplier_id |     1 |                          |
|  5 | DEPENDENT SUBQUERY | sale_items  | ref            | sales_items_product_id | sales_items_product_id | 5       | p.id   |    33 | Using where              |
|  6 | DEPENDENT SUBQUERY | refunds    | index_subquery | IDX_refunds_sale_id    | IDX_refunds_sale_id    | 5       | func    |     1 | Using index; Using where |
|  4 | DEPENDENT SUBQUERY | sale_items  | ref            | sales_items_product_id | sales_items_product_id | 5       | p.id   |    33 | Using where              |
|  3 | DEPENDENT SUBQUERY | stocks     | ref            | IDX_stocks_product_id  | IDX_stocks_product_id  | 5       | p.id    |     1 | Using where              |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------

I am expecting that the query takes less that 3s at most, but I can't seem to figure out the best way to optimize this query.


Solution

  • The query looks fine to me. You select all data and aggregate some of it. This takes time. Your explain plan shows there are indexes on the IDs, which is good. And at a first glance there is not much we seem to be able to do here...

    What you can do, though, is provide covering indexes, i.e. indexes that contain all columns you need from a table, so the data can be taken from the index directly.

    create index idx1 on cats(id, name);
    create index idx2 on suppliers(id, company);
    create index idx3 on stocks(product_id, total_quantity);
    create index idx4 on sale_items(product_id, quantity, sale_id);
    

    This can really boost your query.

    What you can try About the query itself is to move the subqueries to the FROM clause. MySQL's optimizer is not great, so although it should get the same execution plan, it may well be that it favors the FROM clause.

    SELECT
      p.id,
      p.name,
      p.created,
      p.image,
      s.company as supplier,
      s.id AS supplier_id,
      c.name AS category,
      COALESCE(st.total, 0) AS total_qty,
      COALESCE(si.total, 0) AS total_sold,
      COALESCE(si.refund, 0) AS total_refund
    FROM products p
    LEFT JOIN cats c ON c.id = p.cat_id
    LEFT JOIN suppliers s ON s.id = p.supplier_id
    LEFT JOIN
    (
      SELECT SUM(total_quantity) AS total
      FROM stocks 
      GROUP BY product_id
    ) st ON st.product_id = p.id
    LEFT JOIN
    (
      SELECT
        SUM(quantity) AS total,
        SUM(CASE WHEN sale_id IN (SELECT sale_id FROM refunds) THEN quantity END) as refund
      FROM sale_items
      GROUP BY product_id
    ) si ON si.product_id = p.id;
    

    (If sale_id is unique in refunds, then you can even join it to sale_items. Again: this should usually not make a difference, but in MySQL it may still. MySQL was once notorious for treating IN clauses much worse than the FROM clause. This may not be the case anymore, I don't know. You can try - if refunds.sale_id is unique).