Search code examples
sqlrelational-division

JOIN, where all rows are subset of another table


I want to join over 10 tables, then get id column from a specific table. The scenario is a little complicated, but let's simplify it here.

Let's say we have three tables, posts, users and tags. Like thsese:

+-------------------+
| posts             |
+----------+--------+
| posts_id | title  |
+----------+--------+
| 1        | post_1 |
| 2        | post_2 |
+----------+--------+

+------------------+
| users            |
+----------+-------+
| users_id | email |
+----------+-------+
| 1        | [email protected] |
| 2        | [email protected] |
+----------+-------+

+-----------------+
| tags            |
+---------+-------+
| tags_id | label |
+---------+-------+
| 1       | tag_1 |
| 2       | tag_2 |
| 3       | tag_3 |
| 4       | tag_4 |
+---------+-------+

Each posts has many tags:

+------------------------+
| posts_tags             |
+----+---------+---------+
| id | post_id | tags_id |
+----+---------+---------+
| 1  | 1       | 1       |
| 2  | 1       | 2       |
| 3  | 1       | 3       |
| 4  | 2       | 2       |
| 5  | 2       | 3       |
+----+---------+---------+

Also, each users is subscribed to many tags too:

+------------------------+
| users_tags             |
+----+---------+---------+
| id | user_id | tags_id |
+----+---------+---------+
| 1  | 1       | 1       |
| 2  | 1       | 2       |
| 4  | 2       | 2       |
| 5  | 2       | 3       |
| 6  | 2       | 4       |
+----+---------+---------+

I want to find all posts that all of their tags, are subscribed by a user.

In example above, post_2 has 2 tags: tag_2 and tag_3. Also, user_2 has subscribed to 3 tags: tag_2, tag_3, and tag_4

All posts_2 tags are subset of user_2 tags, so I want to select posts_2's posts_id column But posts_ has a tag that the user_2 is not subscribed to (tag_1), so I don't want to select this row.

Here is a sample code I wrote, but is incomplete:

SELECT `posts_id` FROM `posts`
INNER JOIN `posts_tags` USING (`posts_id`)
INNER JOIN `users_tags` USING (`tag_id`)
***** SOME MAGIC HERE *****
WHERE `users_tags`.`user_id` = 2

The SQL code should work on both sqlite and mysql/mariadb.


Solution

  • Do I understand your request correctly: You want to show all posts for which at least one user subscribed to all its tags?

    In that case cross join posts_tags with users, so as to get all data sets (groups of post + user) that may be a match. I put separator lines between the groups, so you can see them better:

    +----+---------+---------+----------+-------+
    | id | post_id | tags_id | users_id | email |
    +----+---------+---------+----------+-------+
    | 1  | 1       | 1       | 1        | [email protected] |
    | 2  | 1       | 2       | 1        | [email protected] |
    | 3  | 1       | 3       | 1        | [email protected] |
    +----+---------+---------+----------+-------+
    | 1  | 1       | 1       | 2        | [email protected] |
    | 2  | 1       | 2       | 2        | [email protected] |
    | 3  | 1       | 3       | 2        | [email protected] |
    +----+---------+---------+----------+-------+
    | 4  | 2       | 2       | 1        | [email protected] |
    | 5  | 2       | 3       | 1        | [email protected] |
    +----+---------+---------+----------+-------+
    | 4  | 2       | 2       | 2        | [email protected] |
    | 5  | 2       | 3       | 2        | [email protected] |
    +----+---------+---------+----------+-------+
    

    Then outer join the users_tags:

    +----+---------+---------+----------+-------+----+---------+---------+
    | id | post_id | tags_id | users_id | email | id | user_id | tags_id |
    +----+---------+---------+----------+-------+----+---------+---------+
    | 1  | 1       | 1       | 1        | [email protected] | 1  | 1       | 1       |
    | 2  | 1       | 2       | 1        | [email protected] | 2  | 1       | 2       |
    | 3  | 1       | 3       | 1        | [email protected] |    |         |         |
    +----+---------+---------+----------+-------+----|---------+---------+
    | 1  | 1       | 1       | 2        | [email protected] |    |         |         |
    | 2  | 1       | 2       | 2        | [email protected] | 4  | 2       | 2       |
    | 3  | 1       | 3       | 2        | [email protected] | 5  | 2       | 3       |
    +----+---------+---------+----------+-------+----+---------+---------+
    | 4  | 2       | 2       | 1        | [email protected] | 1  | 1       | 2       |
    | 5  | 2       | 3       | 1        | [email protected] |    |         |         |
    +----+---------+---------+----------+-------+----|---------+---------+
    | 4  | 2       | 2       | 2        | [email protected] | 4  | 2       | 2       |
    | 5  | 2       | 3       | 2        | [email protected] | 5  | 2       | 3       |
    +----+---------+---------+----------+-------+----+---------+---------+
    

    If you now look at the post/user groups, you see that we have only one match: For post 2 / user 2 there is no user tag missing. Thus we find post 2. Depending on the data there could also be more than one user matching a post's tags, so we must add DISTINCT in order not to select the same post multiple times.

    The query:

    select distinct pt.posts_id
    from posts_tags pt
    cross join users u
    left join users_tags ut on ut.tags_id = pt.tags_id and ut.user_id = u.user_id
    group by pt.posts_id, u.user_id
    having count(*) = count(ut.id)
    order by pt.posts_id;
    

    If you just want to check for one user, e.g. user 2, then the query becomes a tad simpler, because we don't have to join users anymore and we cannot get duplicates:

    select pt.posts_id
    from posts_tags pt
    left join users_tags ut on ut.tags_id = pt.tags_id and ut.user_id = 2
    group by pt.posts_id
    having count(*) = count(ut.id)
    order by pt.posts_id;