Search code examples
sql-servert-sqlrelational-division

Relational division: find each unique set of "child" values in a column


After a lot of searching, it seems most relational division problems target groups with matching members. Maybe I'm having keyword trouble, but I want something a little different: given a parent/group and set of children/members, how do I find each unique combination of members, regardless of parent?

Using the following sample source

CREATE TABLE #m (Parent char(1), ChildID int)

INSERT INTO #m
VALUES ('a',1), ('a', 2), ('a',4),
       ('b',1), ('b', 3), ('b',4),
       ('c',1), ('c', 4), ('c',2),
       ('d',1), ('d',4),
       ('e',3), ('e', 1),
       ('f',4),
       ('g',3), ('g', 4), ('g',1);

SELECT * FROM #m

I'd be looking for a result like (1, 2, 4), (1, 3, 4), (1, 4), (1, 3), (4), expressed as a new temp table (to join back to #m, so that each Parent can be pointed to its "hash" rather than its matching Parent)

There are lots of syntax variations on this stuff; this kind makes the most sense to me, but hasn't gotten me to an answer. Apologies for duplication I can't find.

EDIT: the desired result expressed as a SQL resultset:

UParent ChildID
------- -----------
u1      1
u1      2
u1      4
u2      1
u2      3
u2      4
u3      1
u3      4
u4      1
u4      3
u5      4

Solution

  • I see, you want the "unique" combinations of children, regardless of order.

    The following gets parents that are equivalent:

    select m1.Parent as Parent1, m2.Parent as Parent2
    from (select m.*, count(*) over (partition by Parent) as NumKids
          from #m m
         ) m1 join
         (select m.*, count(*) over (partition by Parent) as NumKids
          from #m m
         ) m2
         on m1.ChildID = m2.ChildID
    group by m1.Parent, m2.Parent
    having count(*) = max(m1.NumKids) and max(m1.NumKids) = max(m2.NumKids);
    

    We can now get what you want using this

    with parents as (
        select m1.Parent as Parent1, m2.Parent as Parent2
        from (select m.*, count(*) over (partition by Parent) as NumKids
              from #m m
             ) m1 join
             (select m.*, count(*) over (partition by Parent) as NumKids
              from #m m
             ) m2
             on m1.ChildID = m2.ChildID
        group by m1.Parent, m2.Parent
        having count(*) = max(m1.NumKids) and max(m1.NumKids) = max(m2.NumKids)
    )
    select distinct m.*
    from (select min(Parent2) as theParent
          from parents
          group by Parent1
         ) p join
         #m m
         on p.theParent = m.Parent;
    

    If you want a new id instead of the old one, use:

    select dense_rank() over (partition by m.Parent) as NewId, m.ChildID
    

    in the select.