Search code examples
mysqlrelational-division

MySQL Select ID's which occur on different rows with multiple specific values for a column


I'm trying to select items from an associative table that have satisfy two or more values of the same field, sounds confusing, let me explain.

+-----------------------+
| item_id | category_id |
+-----------------------+
|   1     |    200      |
|   1     |    201      |
|   1     |    202      |
|   2     |    201      |
|   2     |    202      |
|   3     |    202      |
|   3     |    203      |
|   4     |    201      |
|   4     |    207      |
+-----------------------+

In the table, I want to be able to select only items which are in the categories that I pass. So for example, if I pass category IDs of 201 and 202, I would want only items that are in BOTH categories (they can have other categories but need to be in at least the categories I'm querying), so in this case, I would want items 1 and 2 only since they are the only ones that are in categories 201 and 202.

My initial SQL statement was something like

SELECT * FROM item_category WHERE category_id = 201 AND category_id = 202;

But obviously that won't work.

SELECT * FROM item_category WHERE category_id = 201 OR category_id = 202;

The above query also won't work because it'll return items 4 and 3 as well.

So how would I go about only selecting items that have to at least be in both categories?

Keep in mind that I might be passing more than 2 category IDs.

Thank you for your help.


Solution

  • Your expression in a WHERE clause works against a single row of the joined result set. That's why WHERE category_id = 201 AND category_id = 202 doesn't work -- because it can't be two values on a single row.

    So you need some way to join two rows from the table into one row of the result set. You can do this with a self-join:

    SELECT c1.item_id
    FROM item_category AS c1
    INNER JOIN item_category AS c2 ON c1.item_id = c2.item_id
    WHERE c1.category_id = 201 AND c2.category_id = 202
    

    This technique is hard to scale up when you want to search for three, four, five or more values, because it requires N-1 joins to match N values.

    So another method is to use GROUP BY:

    SELECT c.item_id, COUNT(*) AS cat_count
    FROM item_category AS c
    WHERE c.category_id IN (201,202)
    GROUP BY c.item_id
    HAVING cat_count = 2
    

    Both techniques are okay, and work better in different circumstances.