Search code examples
mysqljointagswhere-in

Retrieve items matching tags, and simultaneously retrieve all tags that go along with each item


I have the following tables, relevant to this question:

project
- projectId (PK)
- projectTitle
- projectDescription
- etc..

tag
- tagId
- tagName
- tagDescription
- etc...

project_tag
- projectId (PK / FK -> project.projectId)
- tagId (PK / FK -> tag.tagId)

I'm implementing tag functionality similar to what StackOverflow has, in that one is able to view a list of items (projects, in my case) tagged by one or multiple tags. What I'm trying to do is select all projects that, at a minimum, are tagged with all tags that I feed to the query, but simultaneously retrieve all tags that go with each individual project.

What I have now works, but I have the feeling that the subquery in the WHERE IN clause is very inefficient, as this would probably be executed for each row, correct?

SELECT
    `project`.*,
    GROUP_CONCAT( DISTINCT `tagName` ORDER BY `tagName` SEPARATOR ' ' ) as `tags`
FROM
    `project`
JOIN
    `project_tag`
    USING ( `projectId` )
JOIN
    `tag`
    USING ( `tagId` )
WHERE
    `projectId` IN (
        SELECT
            `projectId`
        FROM
            `project_tag`
        JOIN
            `tag`
            USING ( `tagId` )
        WHERE
            `tagName` IN ( 'the', 'tags' )
        GROUP BY
            `projectId`
        HAVING
            COUNT( DISTINCT `tagName` ) = 2 # the amount of tags in the IN clause
    )
GROUP BY
    `projectId`

Is there any way to JOIN on tag such that I am able to simultaneously retrieve all tags for the JOINed project, while only JOINing projects that (at a minimum) match all tags I feed to the query, without having to use the WHERE IN clause?

To illustrate an example result, consider these example projects:

projectId: 1, tags: php, cms, webdevelopment
projectId: 2, tags: php, cms, ajax
projectId: 3, tags: c#, cms, webdevelopment

Searching for the tags php and cms would yield (these are not formatted as the actual mysql query result, it's just to illustrate the relevant data):

projectId: 1, tags: php, cms, webdevelopment
projectId: 2, tags: php, cms, ajax

Not just:

projectId: 1, tags: php, cms
projectId: 2, tags: php, cms

Solution

  • The sub query is non correlated (it can be taken out and executed on its own without errors) so should be executed once.

    Might be more efficient to have a subquery to exclude the non matching projects first, and join back the other tables against that. Something like this:-

    SELECT project.*, GROUP_CONCAT( DISTINCT tagName ORDER BY tagName SEPARATOR ' ' ) as tags
    FROM (SELECT projectId, COUNT( DISTINCT tagName ) AS TagCount
            FROM tag
            INNER JOIN project_tag USING (tagId)
            WHERE tagName IN ( 'the', 'tags' )
            GROUP BY projectId
            HAVING TagCount = 2) Sub1
    INNER JOIN project ON Sub1.projectId = project.projectId
    INNER JOIN project_tag USING (projectId)
    INNER JOIN tag USING (tagId)
    GROUP BY projectId
    

    I assume you have an index on tagName.