Search code examples
sqlpostgresqlrecursioncommon-table-expression

Trouble Filtering in PostgreSQL With Recursive CTE Query


I am having trouble formulating a query to meet the following criteria (have a basic CTE query working) but am thinking I am missing/overlooking something and am hoping SO can help (much appreciated in advance!). I've tried looking at some of the other SO posts about recursive CTE's, but unfortunately don't feel like I've been able to answer my question(s) while following along with those posts.

Given a categories table with the following columns:

    `id` int - primary key
    `limit_to` text[] - array of text / is nullable
    `parent_id` int - references `id` / is nullable
    
    with values:
    
    id | limit_to | parent_id
    -------------------------
    1  | {'tc'}   | null
    -------------------------
    2  | null     | 1
    -------------------------
    3  | null     | 2
    -------------------------
    4  | null     | null
    -------------------------
    5  | {'ta'}   | null

I am trying to write a query that will filter out records (parent/child associated) having limit_to IS NOT NULL. In this instance, I would expect rows 1,2,3 and 5 to be filtered out, as 1 has a limit_to value of {'tc'} and its descendants are 2 and 3, respectively. Row 5 also has a value that isn't null {'ta'}. So in this case only row 4 would be returned in the final select.

Query I've tried so far:

with recursive tree as (
  (
    select 
      * 
    from 
      categories 
    where 
      limit_to is null
  ) 
  union all 
    (
      select 
        categories.* 
      from 
        categories 
        inner join tree on tree.id = categories.parent_id 
      where 
        tree.limit_to is null
    )
) 
select 
  id, 
  parent_id 
from 
  categories 
where 
  id in (
    select 
      id 
    from 
      tree
  ) 
  and limit_to is null 
order by 
  id asc

However with this example query, I am still seeing the descendants of row 1 (2 and 3), even though they are recursive children on parent_id = 1, which is being filtered out because its limit_to record has a value. I have a feeling I am missing placing an additional WHERE filter expression, but am not entirely sure on where that is.

I was also working on a query to match where limit_to IS NULL OR '''tc''' = ANY(limit_to) to match on records only having that criteria (so in this case row 5 would be eliminated/not returned, and the other records would return), but I have a feeling I can't get that query working fully until I understand what condition I am missing for filtering entirely on where limit_to IS NULL and ensuring child rows are properly filtered out.

I greatly appreciate any insight on what I might be missing / any pointers ( but not null :P ) in the right direction on how to get the queries properly filtering as I would expect based on what I describe above. Thank you!


Solution

  • As you walk the tree recursively, you need to keep some sort of flag which denotes whether or not an instance of categories.limit_to is not null has been seen in any parent records to the current record. This flag is updated at every join to be true if the new record has categories.limit_to is not null. Then, when you query from the cte, you can simply scan for records containing a flag of false:

    with recursive cte as (
       select c.*, c.limit_to is not null f from categories c where c.parent_id is null
       union all
       select c.*, c1.f or c.limit_to is not null f from cte c1 
       join categories c on c.parent_id = c1.id
    )
    select id, limit_to, parent_id from cte where not f
    

    See fiddle.