So I have this unholy query, and need to clean it up, as it takes ~2 minutes to execute. I can't change any table structure, but I can split it up into sub queries in loops, etc. I'm using C++ and MySQL.
Basically tag(s) are selected, and any user who has a union with the tag must then be selected by the query.
Here is the query, with 123 being a CSV tag id list of length >= 1, and the josh@test.com being a CSV email list of emails to ignore, of length >= 0. I know this is asking a lot, but any advice would be greatly appreciated.
SELECT user_id,user_primaryemail,USER_EMAIL_IS_VALID
FROM users
WHERE ( ( user_id IN ( SELECT union_target_id
FROM systemtag_union
WHERE union_systemtag_id IN ( '123' )
&& union_type = 'User'
GROUP BY union_target_id
HAVING COUNT(DISTINCT union_systemtag_id) = 0) ) )
&& user_primaryemail NOT IN ( 'josh@test.com' )
&& USER_EMAIL_IS_VALID != 'No'
GROUP BY user_primaryemail
Rough table structure:
users
-----
user_id
user_primaryemail
user_email_is_valid
systemtags
-----
systemtag_id
systemtag_union
-----
union_systemtag_id (corresponds to systemtags.systemtag_id)
union_target_id (corresponds, in this case, to users.user_id)
union_type (the type of the union, irrelevant in this case)
EDIT: Here is the result of EXPLAIN, as a CSV:
"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
1,"PRIMARY","users","ALL","user_email","","","",9104,"Using where; Using temporary; Using filesort"
2,"DEPENDENT SUBQUERY","systemtag_union","index","union_systemtag_id,union_type","union_target_id","4","",8,"Using where"
So what I ended up doing was twofold. I added indexes and reoptimized my tables, which helped a bit, then I completely extracted the systemtag subquery, and stored it in a variable, which I then plugged into the bigger query. Even though the subquery only took .2 seconds, it must have been executing for each iteration of the 20k user database. Thanks so much to all of you, your guidance was indispensable.