Search code examples
postgresqlunion-find

Unions of intersecting sets


I have a table representing sets of records like

set_id | record_id
     a |         1
     a |         2
     a |         3
     b |         2
     b |         4
     b |         5
     c |         6
     c |         7
     d |         9
     d |        11
     e |        10
     f |        11
     f |        12   

I want to yield output like this

output
{1, 2, 3, 4, 5}
{6, 7}
{9, 11, 12}
{10}     

Where intersecting sets are combined (notice set a and set b have been combined; d and f have also been combined).

Is there a good way of doing this with SQL, not a stored procedure. I know that I'm looking for a kind of Union-Find procedure.


Solution

  • prepare:

    so=> create table so75(set_id text, record_id int);
    CREATE TABLE
    so=> copy so75 from stdin;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> ^CERROR:  COPY from stdin failed: canceled by user
    CONTEXT:  COPY so75, line 1
    so=> copy so75 from stdin delimiter '|';
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >>      a |         1
     a |         2
     a |         3
     b |         2
     b |         4
     b |         5
     c |         6
     c |         7
     d |         9
     d |        11
     e |        10
     f |        11
     f |        12   
    >> >> >> >> >> >> >>
    >> \.
    COPY 14
    

    qry:

    so=> with keys as (
      with a as (
        select *,count(1) over (partition by record_id) c, array_agg(set_id) over(partition by record_id) cc
        from so75
      )
      select set_id, cc
      from a where c > 1
    )
    select distinct array_agg(distinct record_id)
    from so75
    left outer join keys on keys.set_id = so75.set_id
    group by case when array_length(cc,1) > 1 then cc::text else so75.set_id end;
      array_agg
    -------------
     {6,7}
     {10}
     {1,2,3,4,5}
     {9,11,12}
    (4 rows)