Search code examples
mysqlsql-match-all

Force mySQL to join a table(to do a unoptimized query, that i need)


This is kinda weird. I have the next query:

SELECT * , GROUP_CONCAT( x.tag
SEPARATOR ',' ) AS tags
FROM tag AS t, tag AS x, tag_message_rel AS r, message m
INNER JOIN `user` AS u ON m.user_id = u.id
WHERE t.tag
IN (
'kikikiki', 'dsa'
)
AND m.id = r.message_id
AND t.id = r.tag_id
AND x.id = r.tag_id
GROUP BY m.id
HAVING COUNT( * ) >=2
ORDER BY m.created_at DESC
LIMIT 0 , 20

As you can see i use t to join find the messages that i want, on the other side i use x to print the tags of a message. I i erase the line:

AND x.id = r.tag_id

I will get the messages that i want, but tags will have ALL the tags in the tags table separated by coma. If i leave the line there, i will only get those 2 tags. If i use explain i get:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  u   system  PRIMARY     NULL    NULL    NULL    1   Using temporary; Using filesort
1   SIMPLE  t   range   PRIMARY,tag     tag     252     NULL    2   Using where
1   SIMPLE  x   eq_ref  PRIMARY     PRIMARY     4   verse.t.id  1    
1   SIMPLE  r   ALL     NULL    NULL    NULL    NULL    180     Using where; Using join buffer
1   SIMPLE  m   eq_ref  PRIMARY     PRIMARY     4   verse.r.message_id  1   Using where

Now im no expert in this, but i think the problem is that it is refusing to re-join a table in the process of optimizing the query.

What do you think? Any quick fix?


Solution

  • The problem is that you are trying to join to the tag table twice, but you really need to join to the tag_message_rel table twice, and from each of these to the respective row in the tag table.

    Think of "table aliases" as referring to a row in a table, not the table itself. That idea helped me to understand complex joins a lot better.

    Here's how I'd write that query:

    SELECT m.*, u.*, GROUP_CONCAT(DISTINCT x.tag) AS tags
    FROM message m
     JOIN `user` u ON (u.id = m.user_id)
     JOIN tag_message_rel r1 ON (m.id = r1.message_id)
     JOIN tag t ON (t.id = r1.tag_id)
     JOIN tag_message_rel r2 ON (m.id = r2.message_id)
     JOIN tag x ON (x.id = r2.tag_id)
    WHERE t.tag IN ('kikikiki', 'dsa')
    GROUP BY m.id
    HAVING COUNT(DISTINCT t.tag) = 2
    ORDER BY m.created_at DESC
    LIMIT 0 , 20;
    

    You should develop the habit of using JOIN syntax consistently. Mixing JOIN and comma-style joins can cause some subtle problems.

    Here's an alternative query, that pulls some of the joins into a non-correlated subquery, so you avoid a Cartesian product between t and x, and eliminate the DISTINCT modifiers in the group functions.

    SELECT m.*, u.*, GROUP_CONCAT(x.tag) AS tags
    FROM message m
     JOIN `user` u ON (u.id = m.user_id)
     JOIN tag_message_rel r ON (m.id = r.message_id)
     JOIN tag x ON (x.id = r.tag_id)
    WHERE m.id = ANY (
        SELECT m2.id 
        FROM message m2 
         JOIN tag_message_rel r2 ON (m2.id = r2.message_id)
         JOIN tag t ON (t.id = r2.tag_id)
        WHERE t.tag IN ('kikikiki', 'dsa') 
        GROUP BY m2.id 
        HAVING COUNT(t.tag) = 2)
    GROUP BY m.id
    ORDER BY m.created_at DESC
    LIMIT 0 , 20;