Search code examples
sqlpostgresqlgraph-theoryrecursive-querypostgresql-12

How can i find all linked rows by array values in postgres?


i have a table like this:

id | arr_val  | grp
-----------------
1  | {10,20}  | -
2  | {20,30}  | -
3  | {50,5}   | -
4  | {30,60}  | -
5  | {1,5}    | -
6  | {7,6}    | -

I want to find out which rows are in a group together. In this example 1,2,4 are one group because 1 and 2 have a common element and 2 and 4. 3 and 5 form a group because they have a common element. 6 Has no common elments with anybody else. So it forms a group for itself. The result should look like this:

id | arr_val  | grp
-----------------
1  | {10,20}  | 1
2  | {20,30}  | 1
3  | {50,5}   | 2
4  | {30,60}  | 1
5  | {1,5}    | 2
6  | {7,6}    | 3

I think i need recursive cte because my problem is graphlike but i am not sure how to that.

Additional info and background:

The Table has ~2500000 rows.

In reality the problem i try to solve has more fields and conditions for finding a group:

id | arr_val  | date | val | grp
---------------------------------
1  | {10,20}  | -
2  | {20,30}  | -

Not only do the element of a group need to be linked by common elements in arr_val. They all need to have the same value in val and need to be linked by a timespan in date (gaps and islands). I solved the other two but now the condition of my question was added. If there is an easy way to do all three together in one query that would be awesome but it is not necessary.

----Edit-----

While both answer work for the example of five rows they do not work for a table with a lot more rows. Both answers have the problem that the number of rows in the recursive part explodes and only reduce them at the end. A solutiuon should work for data like this too:

id | arr_val  | grp
-----------------
1  | {1}      | -
2  | {1}      | -
3  | {1}      | -
4  | {1}      | -
5  | {1}      | -
6  | {1}      | -
7  | {1}      | -
8  | {1}      | -
9  | {1}      | -
10 | {1}      | -
11 | {1}      | -
more rows........

Is there a solution to that problem?


Solution

  • You can handle this as a recursive CTE. Define the edges between the ids based on common values. Then traverse the edges and aggregate:

    with recursive nodes as (
          select id, val
          from t cross join
               unnest(arr_val) as val
         ),
         edges as (
          select distinct n1.id as id1, n2.id as id2
          from nodes n1 join
               nodes n2
               on n1.val = n2.val
         ),
         cte as (
          select id1, id2, array[id1] as visited, 1 as lev
          from edges
          where id1 = id2
          union all
          select cte.id1, e.id2, visited || e.id2,
                 lev + 1
          from cte join
               edges e
               on cte.id2 = e.id1
          where e.id2 <> all(cte.visited) 
         ),
         vals as (
          select id1, array_agg(distinct id2 order by id2) as id2s
          from cte
          group by id1
        )
    select *, dense_rank() over (order by id2s) as grp
    from vals;
    

    Here is a db<>fiddle.