Search code examples
sqlsql-servercommon-table-expressionrecursive-query

CTE query for hierarchical data excluding some records despite having full hierarchy


I have a table that stores hierarchical data, where each record may have a parent_item_code referencing another record in the same table. I'm using a Common Table Expression (CTE) to build the full path of each item, concatenating parent names until reaching the final items (leaves).

Here is the query I’m using:

WITH Hierarchy AS (
  -- Select root records (those that do not have a parent)
  SELECT 
     i.item_code,
     i.name,
     i.parent_item_code,
      CAST(i.name AS VARCHAR(1000)) AS hierarchy
  FROM sgt_consulta.items i
  WHERE i.item_type = 'C' 
      AND i.status = 'A' 
      AND i.parent_item_code IS NULL  -- Only root records

  UNION ALL

  -- For each node found, retrieve its children and concatenate the hierarchy
  SELECT
      c.item_code,
      c.name,
      c.parent_item_code,
      CAST(h.hierarchy + ' > ' + c.name AS VARCHAR(1000)) AS hierarchy
  FROM sgt_consulta.items c
  INNER JOIN Hierarchy h ON c.parent_item_code = h.item_code
  WHERE c.item_type = 'C' 
    AND c.status = 'A'
)
-- Select only leaf records (those that have no children)
SELECT 
  h.item_code,
  h.name,
  h.hierarchy
FROM Hierarchy h
LEFT JOIN sgt_consulta.items i ON h.item_code = i.parent_item_code
WHERE i.parent_item_code IS NULL;  -- Only final records

The query works correctly for most records, but I noticed that some records are missing from the result even though:

They have a complete hierarchy, meaning their ancestors are correctly structured. They do not have NULL values in any important fields (item_code, parent_item_code, name, etc.). They should be identified as leaf records, but they are being excluded. I suspect that the issue may be related to the final LEFT JOIN condition:

LEFT JOIN sgt_consulta.items i ON h.item_code = i.parent_item_code
WHERE i.parent_item_code IS NULL;  -- Only final records

If I remove this condition, the missing records appear again, but then I get non-final items in the results, which I want to avoid.

What could be causing this behavior? Is there a more reliable way to filter only leaf nodes without excluding valid records?

Here are some examples of working records and the one I'm having trouble specifically (could only find this one so far. Don't mind the names in portuguese, they shouldn't affect the recreation):

Records being returned correctly (first row is the leaf node):

item_code name parent_item_code item_type status
261 Carta Precatória Cível 257 C A
257 Cartas 214 C A
214 Outros procedimentos 2 C A
2 PROCESSO CÍVEL E DO TRABALHO NULL C A
item_code name parent_item_code item_type status
1729 Agravo Interno Criminal 412 C A
412 Recursos 268 C A
268 PROCESSO CRIMINAL NULL C A

Troublesome record (first row is the leaf node being excluded):

item_code name parent_item_code item_type status
7 Procedimento Comum Cível 1107 C A
1107 Procedimento de conhecimento 1106 C A
1106 Processo de conhecimento 2 C A
2 PROCESSO CÍVEL E DO TRABALHO NULL C A

Solution

  • Adding the answer as requested. On the previous to last line, that ON predicate is wrong. You need to include the same filters you are using when walking the tree. That is:

    ON h.item_code = i.parent_item_code AND i.item_type = 'C' AND i.status = 'A'
    

    The whole query will look like:

    WITH Hierarchy AS (
      -- Select root records (those that do not have a parent)
      SELECT 
         i.item_code,
         i.name,
         i.parent_item_code,
          CAST(i.name AS VARCHAR(1000)) AS hierarchy
      FROM sgt_consulta.items i
      WHERE i.item_type = 'C' 
          AND i.status = 'A' 
          AND i.parent_item_code IS NULL  -- Only root records
    
      UNION ALL
    
      -- For each node found, retrieve its children and concatenate the hierarchy
      SELECT
          c.item_code,
          c.name,
          c.parent_item_code,
          CAST(h.hierarchy + ' > ' + c.name AS VARCHAR(1000)) AS hierarchy
      FROM sgt_consulta.items c
      INNER JOIN Hierarchy h ON c.parent_item_code = h.item_code
      WHERE c.item_type = 'C' 
        AND c.status = 'A'
    )
    -- Select only leaf records (those that have no children)
    SELECT 
      h.item_code,
      h.name,
      h.hierarchy
    FROM Hierarchy h
    LEFT JOIN sgt_consulta.items i ON h.item_code = i.parent_item_code
     AND i.item_type = 'C' AND i.status = 'A'
    WHERE i.parent_item_code IS NULL;  -- Only final records