Search code examples
mysqlperformancedatabase-optimization

Optimize A Messy Query


So I have this unholy query, and need to clean it up, as it takes ~2 minutes to execute. I can't change any table structure, but I can split it up into sub queries in loops, etc. I'm using C++ and MySQL.

Basically tag(s) are selected, and any user who has a union with the tag must then be selected by the query.

Here is the query, with 123 being a CSV tag id list of length >= 1, and the josh@test.com being a CSV email list of emails to ignore, of length >= 0. I know this is asking a lot, but any advice would be greatly appreciated.

SELECT user_id,user_primaryemail,USER_EMAIL_IS_VALID
FROM users
WHERE ( ( user_id IN ( SELECT union_target_id
                       FROM systemtag_union
                       WHERE union_systemtag_id IN ( '123' )
                         && union_type = 'User'
                       GROUP BY union_target_id
                       HAVING COUNT(DISTINCT union_systemtag_id) = 0) ) )
  && user_primaryemail NOT IN ( 'josh@test.com' )
  && USER_EMAIL_IS_VALID != 'No'
GROUP BY user_primaryemail

Rough table structure:

users
-----
user_id
user_primaryemail
user_email_is_valid

systemtags
-----
systemtag_id

systemtag_union
-----
union_systemtag_id (corresponds to systemtags.systemtag_id)
union_target_id (corresponds, in this case, to users.user_id)
union_type (the type of the union, irrelevant in this case)

EDIT: Here is the result of EXPLAIN, as a CSV:

"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
1,"PRIMARY","users","ALL","user_email","","","",9104,"Using where; Using temporary; Using filesort"
2,"DEPENDENT SUBQUERY","systemtag_union","index","union_systemtag_id,union_type","union_target_id","4","",8,"Using where"

Solution

  • So what I ended up doing was twofold. I added indexes and reoptimized my tables, which helped a bit, then I completely extracted the systemtag subquery, and stored it in a variable, which I then plugged into the bigger query. Even though the subquery only took .2 seconds, it must have been executing for each iteration of the 20k user database. Thanks so much to all of you, your guidance was indispensable.