Search code examples
sqlrecursionmany-to-many

Recursive query from a many-to-many table


In a many-to-many mapping table of As to Bs, say there is an initial condition f(A, B) -> bool, which some pairs of (A, B) from that mapping table satisfy.

The problem is to select all pairs where either the A in the pair appears in the set that satisfies the condition or the B in the pair appears in the set that satisfies the condition. Then continue expanding the select set that way until no more distinct pairs are added, so that at the end, the selection contains all pairs from which you could “walk” to a pair that satisfies the initial condition.

Example:

A_id    B_id
------------
 A1      B1
 A1      B3
 A2      B1
 A2      B2

Suppose that the pairs (A2, B1) and (A2, B2) satisfy f, and that the pairs (A1, B1) and (A1, B3) do not. I need the returned set to contain all four pairs, though, because B1 appears in the set that satisfies f (so we include (A1, B1)) and A1 now appears in the growing set (so we include (A1, B3)).

What is the right way to do this server-side? Is doing it on the server better than storing some intermediate results on the client and performing the recursion in several "incremental" queries? How does the answer change when the number of rows is very big?

(I need to do this in SQL Server but that's not necessary for the question.)

Example solution in pseudo-query language:

declare @results table(A_id, B_id)

-- initial set, satisfying the condition
insert into @results -- adds (A2, B1) and (A2, B2)
select A_id, B_id from mapping m where f(A_id, B_id) == true

-- first round of recursion
insert into @results
select m.A_id, r.B_id from mapping m -- adds (A1, B1)
join @results r on r.B_id = m.B_id
insert into @results
select r.A_id, m.B_id from mapping m
join @results r on r.A_id = m.B_id

-- second round of recursion
insert into @results
select m.A_id, r.B_id from mapping m
join @results r on r.B_id = m.B_id
insert into @results
select r.A_id, m.B_id from mapping m -- adds (A1, B3)
join @results r on r.A_id = m.B_id

Solution

  • If you add a third column that gives the f value in your table, you can use this query.

    with CTE as 
    (
        Select A_ID as Node, Path = CAST(a_id as nvarchar(max))
        from Link
        where f = 1
    
        UNION
    
        Select B_ID as Node, Path = CAST(B_ID as nvarchar(max))
        from Link
        where f = 1
    
        UNION ALL
    
        Select L.B_ID, CTE.Path + '-' + L.B_ID
        from CTE inner join Link L ON CTE.node = L.A_ID
        where CTE.path not like '%' + L.B_ID +'%'
    
        UNION ALL
    
        Select L.A_ID, CTE.Path + '-' + L.A_ID
        from CTE inner join Link L ON CTE.node = L.B_ID
        where CTE.path not like '%' + L.A_ID +'%'
    )
    select distinct Node from CTE