I have a friggin bizarre situation going on. One of my nightly queries, which usually takes 5 mins, took >12 hours. Here is the query:
SELECT Z.id,
Z.seoAlias,
GROUP_CONCAT(DISTINCT LOWER(A.include)) AS include,
GROUP_CONCAT(DISTINCT LOWER(A.exclude)) AS exclude
FROM df_productsbystore AS X
INNER JOIN df_product_variants AS Y ON Y.id = X.id_variant
INNER JOIN df_products AS Z ON Z.id = Y.id_product
INNER JOIN df_advertisers AS A ON A.id = X.id_store
WHERE X.isActive > 0
AND Z.id > 60301433
GROUP BY Z.id
ORDER BY Z.id
LIMIT 45000;
I ran an EXPLAIN and got the following:
+----+-------------+-------+--------+------------------------------------------------------------------------------------+-----------+---------+---------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+-----------+---------+---------------------+------+---------------------------------+
| 1 | SIMPLE | A | ALL | PRIMARY | NULL | NULL | NULL | 365 | Using temporary; Using filesort |
| 1 | SIMPLE | X | ref | UNIQUE_variantAndStore,idx_isActive,idx_store | idx_store | 4 | foenix.A.id | 600 | Using where |
| 1 | SIMPLE | Y | eq_ref | PRIMARY,UNIQUE,idx_prod | PRIMARY | 4 | foenix.X.id_variant | 1 | Using where |
| 1 | SIMPLE | Z | eq_ref | PRIMARY,UNIQUE_prods_seoAlias,idx_brand,idx_gender2,fk_df_products_id_category_idx | PRIMARY | 4 | foenix.Y.id_product | 1 | NULL |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+-----------+---------+---------------------+------+---------------------------------+
Which looked different to my development environment. The df_advertisers section looked fishy to me, so I deleted and recreated the index on the X.id_store column, and now the EXPLAIN looks like this and the query is fast again:
+----+-------------+-------+--------+------------------------------------------------------------------------------------+------------------------+---------+-------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+------------------------+---------+-------------------+---------+-------------+
| 1 | SIMPLE | Z | range | PRIMARY,UNIQUE_prods_seoAlias,idx_brand,idx_gender2,fk_df_products_id_category_idx | PRIMARY | 4 | NULL | 2090691 | Using where |
| 1 | SIMPLE | Y | ref | PRIMARY,UNIQUE,idx_prod | UNIQUE | 4 | foenix.Z.id | 1 | Using index |
| 1 | SIMPLE | X | ref | UNIQUE_variantAndStore,idx_isActive,idx_id_store | UNIQUE_variantAndStore | 4 | foenix.Y.id | 1 | Using where |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 4 | foenix.X.id_store | 1 | NULL |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+------------------------+---------+-------------------+---------+-------------+
It would appear that the index magically disappeared. Can anyone explain how this is possible? Am I mean to run a mysqlcheck
command or similar on a regular basis to avoid this kind of thing? I'm stumped!
Thanks
Next time, simply do ANALYZE TABLE df_productsbystore;
It will be very fast, and may solve the problem.
ANALYZE
recomputes the statistics on which the Optimizer depends for deciding, in this case, which table to start with. In rare situations, the stats get out of date and need a kick in the shin.
Caveat: I am assuming you are using InnoDB on a somewhat recent version. If you are using MyISAM, then ANALYZE
is needed more often.
Do you really need 45K rows? What will you do with so many?
A way to speed the query up (probably) is to do everything you can with X and Z in a subquery, then JOIN A
to do the rest:
SELECT XYZ.id, XYZ.seoAlias,
GROUP_CONCAT(DISTINCT LOWER(A.include)) AS include,
GROUP_CONCAT(DISTINCT LOWER(A.exclude)) AS exclude
FROM
(
SELECT Z.id, Z.seoAlias, X.id_store
FROM df_productsbystore AS X
INNER JOIN df_product_variants AS Y ON Y.id = X.id_variant
INNER JOIN df_products AS Z ON Z.id = Y.id_product
WHERE X.isActive > 0
AND Z.id > 60301433
GROUP BY Z.id -- may not be necessary ??
ORDER BY Z.id
LIMIT 45000
) AS XYZ
INNER JOIN df_advertisers AS A ON A.id = XYZ.id_store
GROUP BY ZYZ.id
ORDER BY XYZ.id;
Useful indexes:
Y: INDEX(id_product, id)
X: INDEX(id_variant, isActive, id_store)