I have the following tables, relevant to this question:
project
- projectId (PK)
- projectTitle
- projectDescription
- etc..
tag
- tagId
- tagName
- tagDescription
- etc...
project_tag
- projectId (PK / FK -> project.projectId)
- tagId (PK / FK -> tag.tagId)
I'm implementing tag functionality similar to what StackOverflow has, in that one is able to view a list of items (projects, in my case) tagged by one or multiple tags. What I'm trying to do is select all projects that, at a minimum, are tagged with all tags that I feed to the query, but simultaneously retrieve all tags that go with each individual project.
What I have now works, but I have the feeling that the subquery in the WHERE IN
clause is very inefficient, as this would probably be executed for each row, correct?
SELECT
`project`.*,
GROUP_CONCAT( DISTINCT `tagName` ORDER BY `tagName` SEPARATOR ' ' ) as `tags`
FROM
`project`
JOIN
`project_tag`
USING ( `projectId` )
JOIN
`tag`
USING ( `tagId` )
WHERE
`projectId` IN (
SELECT
`projectId`
FROM
`project_tag`
JOIN
`tag`
USING ( `tagId` )
WHERE
`tagName` IN ( 'the', 'tags' )
GROUP BY
`projectId`
HAVING
COUNT( DISTINCT `tagName` ) = 2 # the amount of tags in the IN clause
)
GROUP BY
`projectId`
Is there any way to JOIN on tag
such that I am able to simultaneously retrieve all tags for the JOIN
ed project, while only JOIN
ing projects that (at a minimum) match all tags I feed to the query, without having to use the WHERE IN
clause?
To illustrate an example result, consider these example projects:
projectId: 1, tags: php, cms, webdevelopment
projectId: 2, tags: php, cms, ajax
projectId: 3, tags: c#, cms, webdevelopment
Searching for the tags php
and cms
would yield (these are not formatted as the actual mysql query result, it's just to illustrate the relevant data):
projectId: 1, tags: php, cms, webdevelopment
projectId: 2, tags: php, cms, ajax
Not just:
projectId: 1, tags: php, cms
projectId: 2, tags: php, cms
The sub query is non correlated (it can be taken out and executed on its own without errors) so should be executed once.
Might be more efficient to have a subquery to exclude the non matching projects first, and join back the other tables against that. Something like this:-
SELECT project.*, GROUP_CONCAT( DISTINCT tagName ORDER BY tagName SEPARATOR ' ' ) as tags
FROM (SELECT projectId, COUNT( DISTINCT tagName ) AS TagCount
FROM tag
INNER JOIN project_tag USING (tagId)
WHERE tagName IN ( 'the', 'tags' )
GROUP BY projectId
HAVING TagCount = 2) Sub1
INNER JOIN project ON Sub1.projectId = project.projectId
INNER JOIN project_tag USING (projectId)
INNER JOIN tag USING (tagId)
GROUP BY projectId
I assume you have an index on tagName.