Search code examples
sqlsql-serverrecursioncommon-table-expressionhierarchy

Create a query that puts hierarchy levels into columns


I am trying to write a query in mssql that converts below input table into the desired output

I have a table that represents hierarchy by self-referencing using parent_id column.

Here is an example of input table that covers my case.

id current_level name parent_id
1 1 Oldest A NULL
2 1 Oldest B NULL
3 2 Old A 1
4 2 Old B 2
5 3 Young A 3
6 3 Young B 4
7 3 Young C 4
8 4 Youngest A 5
9 4 Youngest B 6
10 4 Youngest C 7
11 4 Youngest D 7
12 4 Youngest E 7
13 4 Youngest F 7

The idea here is to expand this table by adding as many columns as there are levels. In this case we have 4 levels so we want to add 4 new columns to the table.

This is desired output

id current_level name parent_id level_1 level_2 level_3 level_4
1 1 Oldest A NULL Oldest A NULL NULL NULL
2 1 Oldest B NULL Oldest B NULL NULL NULL
3 2 Old A 1 Oldest A Old A NULL NULL
4 2 Old B 2 Oldest B Old B NULL NULL
5 3 Young A 3 Oldest A Old A Young A NULL
6 3 Young B 4 Oldest B Old B Young B NULL
7 3 Young C 4 Oldest B Old B Young C NULL
8 4 Youngest A 5 Oldest A Old A Young A Youngest A
9 4 Youngest B 6 Oldest B Old B Young B Youngest B
10 4 Youngest C 7 Oldest B Old B Young C Youngest C
11 4 Youngest D 7 Oldest B Old B Young C Youngest D
12 4 Youngest E 7 Oldest B Old B Young C Youngest E
13 4 Youngest F 7 Oldest B Old B Young C Youngest F

The two main challenges I am facing here:

  1. Number of levels can be dynamic, let's say 1-30 (in this example I presented there are only 4 levels but it is possible that input data will have more levels, e.g. 25, so it would be ideal to make this query dynamic for any number of levels, not just 4)
  2. On levels 3 and higher we need to look up more than just immediate parent - basically we need to go through the whole lookup chain until we hit root element (parent_id == null).

For example, to populate levels for row with id = 5 I am using logic like this:

  • if column_level > current_level then SET to NULL => column level_4 = NULL
  • if column_level == current_level then SET to current_level => column level_3 = Young A
  • if column_level == current_level-1 then SET to parent_level => column level_2 = Old A
  • if column_level == current_level-2 then SET to parent_of_parent_level => column level_1 = Oldest A
  • we hit root level because we are on level_1 and parent_of_parent_of_parent doesn't exist => row number 5 is done

lookup chain for row with id=5 3(parent)->1(parent_of_parent)->NULL


What I tried:

I have tried to approach this with recursive cte but those don't allow to recursively append new columns or at least I couldn't.

I also started thinking about solution in TSQL but no luck either.


Solution

  • You can use a recursive CTE for this. To get the parent levels you just need to concat them together in the recursive part, using either JSON or XML so you can parse it back out as a list.

    It's made substantially more difficult by the fact you want them all as separate columns. You need to therefore use a giant PIVOT inside an APPLY to get all thirty columns (I haven't bothered renaming them, they are all just 1 2 3 etc).

    WITH cte AS (
        SELECT
          t.id,
          t.current_level,
          t.name,
          t.parent_id,
          levels = STRING_ESCAPE(t.name, 'json')
        FROM YourTable t
        WHERE current_level = 1
    
        UNION ALL
    
        SELECT
          t.id,
          t.current_level,
          t.name,
          t.parent_id,
          cte.levels + '","' + STRING_ESCAPE(t.name, 'json')
        FROM cte
        JOIN YourTable t ON t.parent_id = cte.id
    )
    SELECT
      cte.id,
      cte.current_level,
      cte.name,
      cte.parent_id,
      levels.*
    FROM cte
    CROSS APPLY (
        SELECT *
        FROM (
            SELECT
              [key] = CAST([key] AS int) + 1,
              value
            FROM OPENJSON('["' + cte.levels + '"]')
        ) o
        PIVOT (
            MAX(value) FOR [key] IN (
              [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]
            )
        ) p
    ) levels;
    

    If you don't mind just getting one big JSON blob you can remove the APPLY and PIVOT

    WITH cte AS (
        SELECT
          t.id,
          t.current_level,
          t.name,
          t.parent_id,
          levels = STRING_ESCAPE(t.name, 'json')
        FROM YourTable t
        WHERE current_level = 1
    
        UNION ALL
    
        SELECT
          t.id,
          t.current_level,
          t.name,
          t.parent_id,
          cte.levels + '","' + STRING_ESCAPE(t.name, 'json')
        FROM cte
        JOIN YourTable t ON t.parent_id = cte.id
    )
    SELECT
      cte.id,
      cte.current_level,
      cte.name,
      cte.parent_id,
      levels = '["' + cte.levels + '"]'
    FROM cte;
    

    db<>fiddle