Search code examples
sqlsql-servercommon-table-expressionhierarchical-datarecursive-query

Combine values into string inside recursive CTE


I am using a recursive CTE to essentially navigate down a tree structure of id values. At each iteration of the recursion I would like to have a column that represents a string 'list' of all id values (nodes) that have been visited in the iteration steps so far. At first glance it seems like I would need a group by or aggregate function (like string_agg()) to accomplish this, but these are not allowed in the recursive part of a recursive CTE. My question seems to be similar to that found here recursive CTE to combine values, but I am hoping for a slightly more straightforward answer that will relate to the data I have added a sample of below. This first table is essentially the top of the tree. There could be one or multiple nodes at the top depending on how many records are in this first table. Here there is just 1 record:

name_id group_id
100 15

where name_id you could think of as the name of the tree, which is essentially irrelevant to this example since there is only one name_id, and group_id is the top of the tree which will then branch out based on the following table:

group_id group_id_member
10 15
11 15
4 10
11 4
3 11
10 3

where the group_id_member is saying that, for example, the group_id from the first table, 15, is a member of the group_id 10 and also 11. So the tree would look like 15 at the top, with a branch down for 10 and another branch for 11. Then off of 10 comes a branch of 4, and off of 11 comes a branch of 3. Then off of 4 comes a branch of 11, and off of 3 comes a branch of 10. Thus, this is an infinite loop since those branches are already in the table.

Essentially, the goal of this recursive query is to return all the branch nodes but not return any of them more than once. This recursive cte will run forever unless we do a check, like checking whether or not the group_id value is already in rpath. To check that, I am wanting to create a column which is a list of all nodes that have gotten hit from current and previous iterations (rpath column), and append to that column at each iteration, making the result table look something like:

name_id group_id group_id_member iter rpath
100 15 null 1 /15/
100 10 15 2 /15/10/11/
100 11 15 2 /15/10/11/
100 4 10 3 /15/10/11/4/3/
100 3 11 3 /15/10/11/4/3/
100 10 3 4 /15/10/11/4/3/10/11/
100 11 4 4 /15/10/11/4/3/10/11/

Where I included the 4th iteration in the table, which shows that the group_id values 10 and 11 are now in the rpath two times since this loop has been fully completed. Ultimately I would want to terminate this recursion at the 3rd iteration since at the 4th, for both group_id values, they appear in the rpath list already.

So my primary question is, what is the best way to create this rpath string inside the recursive part of the cte? See below for code to create the temp tables, and my code which attempts to solve this but is returning an error since I am trying to use string_agg() in the recursive part of the cte which is not allowed. Thanks!

if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (group_id int, group_id_member int)
INSERT into #t1 VALUES 
   (10,15),
   (11, 15),
   (4, 10), 
   (11, 4),
   (3, 11),
   (10, 3);
if object_id('tempdb..#t2') is not null drop table #t2
CREATE TABLE #t2 (name_id int, group_id int)
INSERT into #t2 VALUES 
   (100, 15)
; with rec as (
select  name_id,
        group_id,
        cast(null as int) as group_id_member,
        1 as iter,
        convert(varchar(128),concat('/', str_agg.agg, '/')) as rpath
    from #t2
    cross apply (select string_agg(group_id, '/') as agg from #t2) as str_agg  -- this does work since it is not in the recursive part of the cte
union all
select  rec.name_id,
        t1.group_id,
        t1.group_id_member,
        iter + 1 as iter,
        convert(varchar(128), concat(rec.rpath, str_agg1.agg1, '/')) as rpath  -- trying to create the string that represents all of the nodes that have been visited
    from #t1 t1
    inner join rec
        on t1.group_id_member = rec.group_id
    cross apply (select string_agg(t1.group_id_member ,'/') as agg1 from #t1 t1 where t1.group_id_member = rec.group_id) str_agg1  -- cross apply to create the string_agg, if this was allowed in recursive cte...
    where rec.rpath not like '%/' + convert(varchar(128), t1.group_id) + '/%'
)
select * from rec order by iter asc

Solution

  • At each iteration of the recursion I would like to have a column that represents a string 'list' of all id values (nodes) that have been visited in the iteration steps so far

    As I understand your question, you don’t need aggregation here. The recursive query processes iteratively, so you can just accumulate the ids of the visited nodes along the way, using concat.

    with rec as (
        select 
            name_id,
            group_id,
            cast(null as int) as group_id_member,
            1 as iter,
            concat('/', convert(varchar(max), group_id)) as rpath
        from #t2
        union all
        select
            r.name_id,
            t1.group_id,
            t1.group_id_member,
            iter + 1 as iter,
            concat(r.rpath, '/', t1.group_id)
        from #t1 t1
        inner join rec r
            on t1.group_id_member = r.group_id
        where r.rpath not like concat('%/', t1.group_id, '/%')
    )
    select * from rec order by iter
    

    Demo on DB Fiddle