Search code examples
mysqlsqlmany-to-manytaxonomy

How to efficiently select distinct tags with associated items in many-to-many relationship?


I'm building a system that has items and tags, with a many-to-many relationship (via an intermediate table), in MySQL. As I've scaled it up, one query has become unacceptably slow, but I'm struggling to make it more efficient.

The query in question amounts to "select all tags that have an item of type x associated with them". Here's a very slightly simplified version:

SELECT DISTINCT(t.id)
FROM tags t 
INNER JOIN items_tags it ON it.tag_id = t.id
INNER JOIN items i ON it.item_id = i.id
WHERE i.type = 10

I have unique primary indexes on t.id, item.id and "it.tag_id, it.item_id". The problem I'm having is that the items_tags table is at a size (~1,400,000 rows) where the query takes too long (one thing that puzzles me here is that phpMyAdmin seems to think the query is fast - it times it as a few ms, but in practice it seems to take 6 or 7 seconds).

It feels to me as if there might be a way of joining the items_tags table to itself to reduce the size of the result set (and perhaps remove the need for that DISTINCT clause), but I can't figure out how... Alternatively, it occurs to me that there might be a better way of indexing things. Any help or suggestions would be much appreciated!


Solution

  • Well, for the record, here's what worked for me (though I'd still be interested if anyone has any other suggestions).

    It was pointed out (in the comments above - thanks @Turophile!) that since tag id is available in the items_tags table, I could leave the tags table out. I actually did need other fields (eg. name) from the tags table (I simplified the query a little for the question), but I found that removing the tags table from the above query and joining the tags table onto its results was significantly faster (EXPLAIN showed that it allowed fewer rows to be scanned). That made the query look more like this:

    SELECT 
        tags.id,
        tags.name
    FROM tags 
    INNER JOIN (
        SELECT DISTINCT(it.tag_id) AS tag_id
        FROM items_tags it
        JOIN items i ON it.item_id = i.id
        WHERE i.type = 10 
    ) it ON tags.id = it.tag_id
    

    This was about 10x faster than the previous version of the query (reduced the average time from about 27s to ~2.5s).

    On top of that, adding an index to items.type improved things further (reduced the average time from ~2.5s to ~1.2s).