Search code examples
sqlpostgresqlaggregate-functionsrelational-division

SQL query over parents with exact same set of children


I have parent-child relation in my table (simplified version listed below):

| parent | child |
|--------+-------|
| p1     | c1    |
| p1     | c2    |
|--------+-------|
| p2     | c1    |
| p2     | c2    |
|--------+-------|
| p3     | c1    |
|--------+-------|
| p4     | c2    |
|--------+-------|
| p5     | c3    |

I need to query all parents with the exact same set of children like p1 and p2.

Seems to be related to this question, but there are hard-coded children.


Solution

  • This query lists all parents sharing the same children, where there are more than one parent:

    SELECT array_agg(parent) AS parents, children, count(*) AS dupes
    FROM (
       SELECT parent, array_agg(child ORDER BY child) AS children
       FROM   tbl
       GROUP  BY 1
       ) sub
    GROUP  BY 2
    HAVING count(*) > 1;
    

    Simple, but not the fastest possible solution. For big tables, I would try something else.

    The problem belongs to the "relational division" family. Here is a collection of query styles we assembled some time ago:
    How to filter SQL results in a has-many-through relation

    One way (of many) to get single parent per row:

    WITH cte AS (
       SELECT parent, array_agg(child ORDER BY child) AS children
       FROM   tbl
       GROUP  BY 1
       )
    SELECT *
    FROM   cte c
    WHERE EXISTS (
       SELECT 1
       FROM   cte c1
       WHERE  c1.children = c.children
       AND    c1.parent <> c.parent
       );