I'm stuck with a performance issue:
A shop has an article filter with categories "color", "size", "gender" and "feature". All those details are stored inside an article_criterias
table, that looks like this:
Table layout of article_criterias
is; this table has about 36.000 rows:
article_id | group | option | option_val
100 | "size" | "35" | 35.00
100 | "size" | "36" | 36.00
100 | "size" | "36½" | 36.50
100 | "color" | "40" | 40.00
100 | "color" | "50" | 50.00
100 | "gender" | "1" | 1.00
101 | "size" | "40" | 40.00
...
We have a SQL query that is built dynamically, based on which criteria are currently selected. The query is good for 2-3 criteria, but will get very slow when selecting more than 5 options (each additional INNER JOIN roughly doubles the execution time)
How can we make this SQL faster, maybe even replacing the inner joins with a more performant concept?
This is the query (the logic is correct, just the performance is bad):
-- This SQL is generated when the user selected the following criteria
-- gender: 1
-- color: 80 + 30
-- size 36 + 37 + 38 + 39 + 42 + 46
SELECT
criteria.group AS `key`,
criteria.option AS `value`
FROM articles
INNER JOIN article_criterias AS criteria ON articles.id = criteria.article_id
INNER JOIN article_criterias AS criteria_gender
ON criteria_gender.article_id = articles.id AND criteria_gender.group = "gender"
INNER JOIN article_criterias AS criteria_color1
ON criteria_color1.article_id = articles.id AND criteria_color1.group = "color"
INNER JOIN article_criterias AS criteria_size2
ON criteria_size2.article_id = articles.id AND criteria_size2.group = "size"
INNER JOIN article_criterias AS criteria_size3
ON criteria_size3.article_id = articles.id AND criteria_size3.group = "size"
INNER JOIN article_criterias AS criteria_size4
ON criteria_size4.article_id = articles.id AND criteria_size4.group = "size"
INNER JOIN article_criterias AS criteria_size5
ON criteria_size5.article_id = articles.id AND criteria_size5.group = "size"
INNER JOIN article_criterias AS criteria_size6
ON criteria_size6.article_id = articles.id AND criteria_size6.group = "size"
INNER JOIN article_criterias AS criteria_size7
ON criteria_size7.article_id = articles.id AND criteria_size7.group = "size"
WHERE
AND (criteria_gender.option IN ("1"))
AND (criteria_color1.option IN ("80", "30"))
AND (criteria_size2.option_val BETWEEN 35.500000 AND 36.500000)
AND (criteria_size3.option_val BETWEEN 36.500000 AND 37.500000)
AND (criteria_size4.option_val BETWEEN 37.500000 AND 38.500000)
AND (criteria_size5.option_val BETWEEN 38.500000 AND 39.500000)
AND (criteria_size6.option_val BETWEEN 41.500000 AND 42.500000)
AND (criteria_size7.option_val BETWEEN 45.500000 AND 46.500000)
Key/value tables are really a nuisance. However, in order to find certain criteria matches aggregate your data:
select
a.*,
ac.group AS "key",
ac.option AS "value"
from articles a
join article_criterias ac on ac.article_id = a.article_id
where a.article_id in
(
select article_id
from article_criterias
group by article_id
having sum("group" = 'gender' and option = '1') > 0
and sum("group" = 'color' and option in ('30','80')) > 0
and sum("group" = 'size' and option_val between 35.5 and 36.5) > 0
and sum("group" = 'size' and option_val between 36.5 and 37.5) > 0
and sum("group" = 'size' and option_val between 37.5 and 38.5) > 0
and sum("group" = 'size' and option_val between 38.5 and 39.5) > 0
and sum("group" = 'size' and option_val between 41.5 and 42.5) > 0
and sum("group" = 'size' and option_val between 45.5 and 46.5) > 0
)
order by a.article_id, ac.group, ac.option;
This gets you all articles that are available for gender 1, colors 30 and/or 80, and all listed size ranges, along with all their options. (The size ranges are a bit strange, though; a size 36.5 would meet two ranges for instance.) You get the idea: group by article_id and use HAVING
in order to only get article_ids that meet the critria.
As to indexes you'll want
create index idx on article_criterias(article_id, "group", option, option_val);