Search code examples
mysqljoinquery-performanceentity-attribute-value

Optimize SQL query with many inner joins on same table


I'm stuck with a performance issue:

A shop has an article filter with categories "color", "size", "gender" and "feature". All those details are stored inside an article_criterias table, that looks like this:

Table layout of article_criterias is; this table has about 36.000 rows:

article_id | group    | option | option_val
       100 | "size"   | "35"   |     35.00
       100 | "size"   | "36"   |     36.00
       100 | "size"   | "36½"  |     36.50
       100 | "color"  | "40"   |     40.00
       100 | "color"  | "50"   |     50.00
       100 | "gender" | "1"    |      1.00
       101 | "size"   | "40"   |     40.00
       ...

We have a SQL query that is built dynamically, based on which criteria are currently selected. The query is good for 2-3 criteria, but will get very slow when selecting more than 5 options (each additional INNER JOIN roughly doubles the execution time)

How can we make this SQL faster, maybe even replacing the inner joins with a more performant concept?

This is the query (the logic is correct, just the performance is bad):

-- This SQL is generated when the user selected the following criteria
-- gender: 1
-- color: 80 + 30
-- size 36 + 37 + 38 + 39 + 42 + 46
SELECT
    criteria.group AS `key`,
    criteria.option AS `value`
FROM articles
    INNER JOIN article_criterias AS criteria ON articles.id = criteria.article_id
    INNER JOIN article_criterias AS criteria_gender 
        ON criteria_gender.article_id = articles.id AND criteria_gender.group = "gender"
    INNER JOIN article_criterias AS criteria_color1 
        ON criteria_color1.article_id = articles.id AND criteria_color1.group = "color"
    INNER JOIN article_criterias AS criteria_size2 
        ON criteria_size2.article_id = articles.id AND criteria_size2.group = "size"
    INNER JOIN article_criterias AS criteria_size3 
        ON criteria_size3.article_id = articles.id AND criteria_size3.group = "size"
    INNER JOIN article_criterias AS criteria_size4 
        ON criteria_size4.article_id = articles.id AND criteria_size4.group = "size"
    INNER JOIN article_criterias AS criteria_size5 
        ON criteria_size5.article_id = articles.id AND criteria_size5.group = "size"
    INNER JOIN article_criterias AS criteria_size6 
        ON criteria_size6.article_id = articles.id AND criteria_size6.group = "size"
    INNER JOIN article_criterias AS criteria_size7 
        ON criteria_size7.article_id = articles.id AND criteria_size7.group = "size"
WHERE
    AND (criteria_gender.option IN ("1"))
    AND (criteria_color1.option IN ("80", "30"))
    AND (criteria_size2.option_val BETWEEN 35.500000 AND 36.500000)
    AND (criteria_size3.option_val BETWEEN 36.500000 AND 37.500000)
    AND (criteria_size4.option_val BETWEEN 37.500000 AND 38.500000)
    AND (criteria_size5.option_val BETWEEN 38.500000 AND 39.500000)
    AND (criteria_size6.option_val BETWEEN 41.500000 AND 42.500000)
    AND (criteria_size7.option_val BETWEEN 45.500000 AND 46.500000)

Solution

  • Key/value tables are really a nuisance. However, in order to find certain criteria matches aggregate your data:

    select 
      a.*,
      ac.group AS "key",
      ac.option AS "value"
    from articles a
    join article_criterias ac on ac.article_id = a.article_id
    where a.article_id in
    (
      select article_id
      from article_criterias
      group by article_id
      having sum("group" = 'gender' and option = '1') > 0
         and sum("group" = 'color' and option in ('30','80')) > 0
         and sum("group" = 'size' and option_val between 35.5 and 36.5) > 0
         and sum("group" = 'size' and option_val between 36.5 and 37.5) > 0
         and sum("group" = 'size' and option_val between 37.5 and 38.5) > 0
         and sum("group" = 'size' and option_val between 38.5 and 39.5) > 0
         and sum("group" = 'size' and option_val between 41.5 and 42.5) > 0
         and sum("group" = 'size' and option_val between 45.5 and 46.5) > 0
    )
    order by a.article_id, ac.group, ac.option;
    

    This gets you all articles that are available for gender 1, colors 30 and/or 80, and all listed size ranges, along with all their options. (The size ranges are a bit strange, though; a size 36.5 would meet two ranges for instance.) You get the idea: group by article_id and use HAVING in order to only get article_ids that meet the critria.

    As to indexes you'll want

    create index idx on article_criterias(article_id, "group", option, option_val);