Search code examples
mysqlsqlcross-join

SQL find all homologous categories


I have a many-to-many relation between Category and Item, which is implemented with the "category_item" table, having 2 columns (PK): category_id and item_id.

Two homologous categories have the same items (order is not relevant):

A
  1
  2

is homologous to

B
  1
  2

but is not to

C       D       E
  1       1       1
  3               2
                  3

Given a category id, how can I find all homologous categories?


Suppose this data in item_category:

A   1
A   2
B   1
B   2
C   1
C   3
D   1
E   1
E   2
E   3

I want to find all categories homologous to A (the expected result is just B)

I'm currently trying something like:

select r2.category_id
from category_item r1, category_item r2
where r1.category_id = ?
    and r2.category_id <> r1.category_id
    and r1.item_id = r2.item_id

that builds the table:

A   1   B   1
A   1   C   1
A   1   D   1
A   1   E   1
A   2   B   2
A   2   D   2
A   2   E   2

but I don't know how to continue...


I'm using MySQL 5.7, but I'd like to do it with just generic SQL.

note that this is not a homework (also I don't think any teacher will assign such a complex one), it's just a extra-simplified use case for a real world problem


Solution

  • You can also do it using subqueries and exists/ not exists

    Select Distinct category_id 
    from category_item ci
    Where not exists   -- this allows only cats that do not have all req items
          (select * from category_item
           Where category_id = ci.category_id 
             and item_id Not in 
                (Select item_id from category_item
                 Where category_Id = @catId))
      and not exists   -- this filters out cats that have xtra items
          (Select * from category_item
           Where category_Id = @catId
              and item_id Not in
                (Select item_id from category_item
                 Where category_Id = ci.category_Id )) 
      and category_Id <> @catId -- <- categoryId of category you are matching
                                -- this line filters out the category you are 
                                -- matching against. Remove it if you want all
                                -- homologous categories
    

    EXPLANATION:

    Select All distinctCategory_Ids from the join table where: First subquery: where there is not another join table row for that same category with anItem_idthat is *not* in the set ofItem_ids for the category you are matching against, and Second subquery: where there is not an item in the category you are matching against that is not also in the category you are selecting (ci table)