I have a table of parent-child relationships, and for a list of nodes (let's call them "roots"), I want to see which ones have one (or more) parents with some characteristic (let's call it "bad").
I can write a recursive CTE to find all parent nodes, and then compare them against a list of bad nodes. However, this is slow, as a node can have very many parents, and I am not really interested in finding all of them. I only need to know if some parent is bad. I would like to stop the recursion for a particular root when a bad parent is found.
Here is a mock-up of what the query could look like:
with recursive_table (root, level, path, current_node) as (
select root = node, level = 0, path = 'root', current_node = node
from seed_nodes
UNION ALL
select o.root, level + 1, CONCAT(e.parent, '/', o.path), e.parent
from structure e
inner join recursive_table o
on o.current_node = e.child
-- it is possible to stop searching this particular branch if we pass a bad node
and o.node not in (
select b.node
from bad_node_list b
)
-- it is NOT possible to stop investigating a particular root if any bad node is found
and o.root not in (
select oo.root
from recursive_table oo
join bad_node_list b
on oo.current_node = b.node
where o.root = oo.root
)
)
select *
from seed_nodes a
outer apply (
select top 1 current_node
from recursive_table b
join bad_node_list c
on b.current_node = c.node
where a.node = b.root
) b
As the code in the comments say, it is possible to stop searching in a branch when a bad node is found. But it is not possible to stop searching the root altogether when the first bad node is found. SQL Server will give the error "Recursive member of a common table expression 'recursive_table' has multiple recursive references."
Is there any way around this? (I know that SQL server tries to disallow some other functions, like aggregate functions, but that it's possible to get around that and still use them.)
Is there any way to access the other branches in a recursive function? No.
Recursion in SQL Server is implemented so that this is not possible. Additionally, recursion actually happens depth-first, and not breadth-first (as the syntax of SQL recursion might have you believe).
It is possible to rewrite this as an actual breadth-first search, by using a loop, and include the condition to stop searches when the first bad node is found.
For my particular set of data, this did not improve performance, and was instead around 50% more time-consuming (15 minutes instead of 10 minutes).