Search code examples
sql-servert-sqlgroupingcommon-table-expressionhierarchy

Identify and label each branch in a hierarchical structure


I am creating a cleansing view for a ledger table where transactions are stored with hierarchical structure in the source system. I want to create an additional column 'Group_Id' identifying all the records which belong to the same transaction set.

My goal is to find the most efficient way to populate the Group_Id column like in the following example:

Entry_No Closed_By_Entry_No Group_Id
1 4 1
2 4 1
3 4 1
4 5 1
5 0 1
6 9 2
7 9 2
8 9 2
9 11 2
10 11 2
11 0 2
12 14 3
13 14 3
14 0 3
15 16 4
16 0 4
17 0 5
18 20 6
19 20 6
20 22 6
21 22 6
22 0 6

Where Closed_By_Entry_No is the Parent_Id and Entry_No is the Child_Id and the Parent_Id of the top-most Child_Id of each set is always 0 (zero).

The challenge for me is that I don't have one hierarchy in my dataset, I have hundreds of thousand of them. Typically each set is made of two records: 1 invoice + 1 payments (70% of the records in total), but then there are more elaborated situations like in the above example where I can have e.g. 15 invoices + 3 payments + 1 credit note. And I need to identify all of those sets in a way I can treat them as a single subject for further calculations.

The only idea I have now to solve this is feeding a cursor with only

SELECT Entry_No, Closed_By_Entry_No 
FROM table 
WHERE Closed_By_Entry_No = 0

Scroll this cursor through the resultset sending each candidate record to a stored procedure containing the recursive CTE to read the hierarchy from the original table and populating the Group_Id incrementally from the cursor index itself and storing the result in a temp table. Then continue from that temp table ....

There must be a simpler method that doesn't use either cursors or temporary tables.


Solution

  • It's a straightforward recursive common table expression:

    declare @LedgerEntries as Table ( Entry_No Int, Closed_By_Entry_No Int, Group_Id Int );
    insert into @LedgerEntries ( Entry_No, Closed_By_Entry_No, Group_Id ) values
      ( 1, 4, 1 ),
      ( 2, 4, 1 ),
      ( 3, 4, 1 ),
      ( 4, 5, 1 ),
      ( 5, 0, 1 ),
      ( 6, 9, 2 ),
      ( 7, 9, 2 ),
      ( 8, 9, 2 ),
      ( 9, 11, 2 ),
      ( 10, 11, 2 ),
      ( 11, 0, 2 ),
      ( 12, 14, 3 ),
      ( 13, 14, 3 ),
      ( 14, 0, 3 ),
      ( 15, 16, 4 ),
      ( 16, 0, 4 ),
      ( 17, 0, 5 ),
      ( 18, 20, 6 ),
      ( 19, 20, 6 ),
      ( 20, 22, 6 ),
      ( 21,  22, 6 ),
      ( 22, 0, 6 );
    
    select * from @LedgerEntries;
    
    with Hierarchy as (
      -- Anchor query.
      --   Start with the closed by zero ledger entries and assign a group id.
      select Entry_No, Closed_By_Entry_No, Group_Id,
        Row_Number( ) over ( order by Entry_No ) as NewGroupId
        from @LedgerEntries
        where Closed_By_Entry_No = 0
      union all
      -- Recursive query.
      --   Add each level, working backwards, of ledger entries and propagate the new group id.
      select LE.Entry_No, LE.Closed_By_Entry_No, LE.Group_Id, H.NewGroupId
        from Hierarchy as H inner join
          @LedgerEntries as LE on LE.Closed_By_Entry_No = H.Entry_No
      )
      select *
        from Hierarchy
        order by Entry_No;
    

    dbfiddle.