Search code examples
mysqlsqlinner-joinquery-performanceentity-attribute-value

MySQL mass INNER JOIN queries cause slow response


I've read the others related questions but mine is unique because of it's structure.

My App is storing about 10,000+ users with profiles defined by many parameters (gender, weight, height, hair color, eye color, dancing skills... and so on, about 100 attributes, lets say).

The application construct a Filter Form with these attributes. User is filtering database using this form so constructs a Query with many sub-queries, one for each filter used.

The problem is that using more than 8-9 filters, the engine will crash into a very long response (I had to kill the process after 30m of waiting).

So, this is the structure of Database

Table def_attributes (here are the attributes definitions)

  • id ---> uses as attr_id in values table

Table utilizatori (users definitions, only column activ is used now)

  • id ---> is named user_id in rest of tables
  • activ ---> is 1 if user is active and will be displayed (column indexed)

Table val_atribute (storing the values of attributes for each user)

  • attr_id ---> the attrID of a filter (column indexed)
  • attr_value ---> value of the attribute
  • user_id (column indexed)

For example, here is a query constructed by the Filtering form, that lags out:

SELECT DISTINCT Q1.user_id
FROM   (SELECT DISTINCT val_atribute.user_id
        FROM   val_atribute
        WHERE  attr_id = 45
               AND attr_value IN ( 'Actor', 'Actor Amator' )) Q1
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 46
                          AND Floor(Datediff(Curdate(), attr_value) / 365) >= '20') Q2
               ON Q1.user_id = Q2.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 46
                          AND Floor(Datediff(Curdate(), attr_value) / 365) <= '50') Q3
               ON Q2.user_id = Q3.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 47
                          AND attr_value IN ( 'feminin', 'masculin' )) Q4
               ON Q3.user_id = Q4.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 102
                          AND attr_value IN ( 'African', 'Asiatic', 'Caucazian', 'Metis' )) Q5
               ON Q4.user_id = Q5.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 103
                          AND attr_value >= 1) Q6
               ON Q5.user_id = Q6.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 103
                          AND attr_value <= 200) Q7
               ON Q6.user_id = Q7.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 104
                          AND attr_value >= 10) Q8
               ON Q7.user_id = Q8.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 104
                          AND attr_value <= 150) Q9
               ON Q8.user_id = Q9.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 107
                          AND attr_value IN ( 'Albastri', 'Caprui', 'Heterocrom', 'Verzi' )) Q10
               ON Q9.user_id = Q10.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 108
                          AND attr_value IN ( 'Blond', 'Brunet', 'Castaniu', 'Roscat', 'Saten' )) Q11
               ON Q10.user_id = Q11.user_id
       INNER JOIN (SELECT DISTINCT val_atribute.user_id
                   FROM   val_atribute
                   WHERE  attr_id = 109
                          AND attr_value IN ( 'Calvitie', 'Lung', 'Mediu', 'Scurt', 'Zero' )) Q12
               ON Q11.user_id = Q12.user_id
       INNER JOIN (SELECT DISTINCT utilizatori.id
                   FROM   utilizatori
                   WHERE  activ = 1) Q13
               ON Q12.user_id = Q13.id
GROUP  BY user_id

Q2 is calculating AGE becuase weonly have attribute [Date of birth] and filter Q2 wants age > 20.

The last Query (here Q13) is always mathing active user from Table utilizatori.

I think is a problem with cartesian progression but Question: how can I remake the query to make it way more faster? Thank you very much!

Edit / Problem solved:

Following Gordon Linoff's great help, I've constructed the correct query, using same filters:

SELECT u.id FROM utilizatori u WHERE EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 45 AND attr_value IN ( 'Actor', 'Actor Amator' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 46 AND Floor(Datediff(Curdate(), attr_value) / 365) >= 20) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 46 AND Floor(Datediff(Curdate(), attr_value) / 365) <= 50) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 47 AND attr_value IN ( 'feminin', 'masculin' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 102 AND attr_value IN ( 'African', 'Asiatic', 'Caucazian', 'Metis' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 103 AND attr_value >= 1) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 103 AND attr_value <= 200) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 104 AND attr_value >= 10) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 104 AND attr_value <= 150) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 107 AND attr_value IN ( 'Albastri', 'Caprui', 'Heterocrom', 'Verzi' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 108 AND attr_value IN ( 'Blond', 'Brunet', 'Castaniu', 'Roscat', 'Saten' )) AND EXISTS (SELECT 1 FROM val_atribute va WHERE va.user_id = u.id AND va.attr_id = 109 AND attr_value IN ( 'Calvitie', 'Lung', 'Mediu', 'Scurt', 'Zero' )) AND activ = 1

Now query takes about 0.0015 seconds to run.


Solution

  • Subqueries in MySQL are problematic -- and the select distinct makes things worse. You are connecting the subqueries using and. I would suggest constructing the same logic using exists instead.

    So:

    select u.*
    from users u
    where exists (select 1
                  from val_atribute va
                  where va.user_id = u.user_id and
                        va.attr_id = 45 and
                        va.attr_value in ( 'Actor', 'Actor Amator' )
                 ) and
          exists (select 1
                  from val_atribute va
                  where va.user_id = u.user_id and
                        va.attr_id = 46 and
                        Floor(Datediff(Curdate(), va.attr_value) / 365) >= 20) Q2
                 ) and
          . . .
    

    This version of the query can take advantage of an index on val_attribute(user_id, attr_id, attr_value). It should be much faster and have better scalability.