Search code examples
sql-servert-sqlhierarchyid

Convert hierarchy defined by position to SQL hierarchy id defined


I have a lot of data from an old system which defines the data in a Bill of Materials by the position it exists in a table. The BoM data table coming from the old system looks like

    ID  level   ItemNumber
    1   1   TopItem
    2   .2  FirstChildOfTop
    3   .2  2ndChildofTop
    4   .2  3ChildOfTop
    5   ..3 1stChildof3ChildofTop
    6   ..3 2ndChildof3ChildofTop
    7   .2  4thChildofTop
    8   ..3 1stChildof4ChildTop
    9   ...4    1stChildof4ChildTop
    10  ..3 2ndChildof4ChildofTop
    11  .2  5thChildofTop
    12  ..3 1stChildof5thChildofTop
    13  ...4    1stChildof1stChildof5thChildofTop
    14  ..3 2ndChildof5thChildofTop
    15  1   2ndTopItem
    16  1   3rdTopItem

In my example the ID is consecutive, the real data the ID can be broken but always lowest to highest as that is how the hierarchy is defined.

By using some simple code to replace the level number with tabs we can get visual hierarchy

    1    TopItem
    2        FirstChildOfTop
    3        2ndChildofTop
    4        3ChildOfTop
    5            1stChildof3ChildofTo
    6            2ndChildof3ChildofTo
    7        4thChildofTop
    8            1stChildof4ChildTop
    9                1stChildof4ChildTop
    10           2ndChildof4ChildofTo
    11       5thChildofTop
    12           1stChildof5thChildof
    13               1stChildof1stChildof
    14           2ndChildof5thChildof
    15   2ndTopItem
    16   3rdTopItem

As I have about 5,000 of these lists and they are all between 25 and 55 thousand lines long, I need some code to convert this hierarchy to use sql HierarchyID so we can query at any level in the list. At the moment I hope my explanation shows, you have to work from the top to find in the Item is 2nd, 3rd or some other level and if it has any children. The items in the third column exist in a simple Item Master table but its role in a BoM is defined in these tables only.

I'd offer some code but all my attempts and conversion have failed miserably. I'd claim I'm OK a set based queries

The target is Microsoft SQL 2014 The primary aim is to data warehouse the data but enable to people to find sub-assemblies and where used.

Edit: In answer to Anthony Hancock's very pertinent question I did some work. Please consider the following

    ID  level   ItemNumber  sampH     lft       rgt
   1    1   TopItem   1/2           2           28        
   2    .2  FirstChildOfTop   1/2/3         3           4         
   3    .2  2ndChildofTop     1/2/3         5           6         
   4    .2  3ChildOfTop   1/2/3         7           11        
   5    ..3 1stChildof3ChildofTop     2/3/4         8           9         
   6    ..3 2ndChildof3ChildofTop     2/3/4         10          11        
   7    .2  4thChildofTop     1/2/3         13          20        
   8    ..3 1stChildof4ChildTop   2/3/4         14          17        
   9    ...4    1stChildof4ChildTop   3/4/5         15          16        
  10    ..3 2ndChildof4ChildofTop     2/3/4         18          19        
  11    .2  5thChildofTop     1/2/3/        20          25        
  12    ..3 1stChildof5thChildofTop   2/3/4         21          24        
  13    ...4    1stChildof1stChildof5thChildofTop     3/4/5         22          23        
  14    ..3 2ndChildof5thChildofTop   2/3/4         26          27        
  15    1   2ndTopItem    1/2           2           28        
  16    1   3rdTopItem    1/2           2           28        
  17    0   verytop   1/            1           29        

Apologies for the awful formatting
1) I have added at line 17 the item we are making - ie this BoM makes the 'verytop' item - so I have renumbered the 'level'
2) I have added in the column 'sampH' column my hand edited PathEnumeratedTree values
3) In the two columns 'lft' and 'rgt' I have added some identifiers of NestedSets data
Please forgive if my hand edited columns aren't correct.

My aim is to get a structure so that someone can query these many deep list to find where an item sits in the tree and what are its children. So I'm open to whatever works.

My testing of the NestedSets - so far - has shown I can do stuff like this:

-- Children of a given parent ItemNumber

Select c.itemnumber, ' is child of 2ndTopItem'
from [dbo].[Sample] as p, [dbo].[Sample] as c
where (c.lft between p.lft and p.rgt)
and (c.lft <> p.lft)
and p.ItemNumber = '2ndTopItem'

But I am completely open to any suggestions how to enumerate the tree structure.


Solution

  • Try the following code:

    declare @Source table (
        Id       int         ,
        [Level]  varchar(20) ,
        [Name]   varchar(50)
    );
    
    declare @Target table (
        Id        int          ,
        [Level]   int          ,
        [Name]    varchar(50)  ,
        ParentId  int          ,
        Hid       hierarchyid  ,
    
        primary key (Id),
        unique ([Level], Id),
        unique (ParentId, Id)
    );
    
    -- 1. The Test Data (Thanks Anthony Hancock for it)
    
    insert into @Source
    values
        (  1 , '1'    , 'TopItem'                            ),
        (  2 , '.2'   , 'FirstChildOfTop'                    ),
        (  3 , '.2'   , '2ndChildofTop'                      ),
        (  4 , '.2'   , '3ChildOfTop'                        ),
        (  5 , '..3'  , '1stChildof3ChildofTop'              ),
        (  6 , '..3'  , '2ndChildof3ChildofTop'              ),
        (  7 , '.2'   , '4thChildofTop'                      ),
        (  8 , '..3'  , '1stChildof4ChildTop'                ),
        (  9 , '...4' , '1stChildof4ChildTop'                ),
        ( 10 , '..3'  , '2ndChildof4ChildofTop'              ),
        ( 11 , '.2'   , '5thChildofTop'                      ),
        ( 12 , '..3'  , '1stChildof5thChildofTop'            ),
        ( 13 , '...4' , '1stChildof1stChildof5thChildofTop'  ),
        ( 14 , '..3'  , '2ndChildof5thChildofTop'            ),
        ( 15 , '1'    , '2ndTopItem'                         ),
        ( 16 , '1'    , '3rdTopItem'                         );
    
    
    -- 2. Insert the Test Data to the @Target table
    --    with converting of the Level column to int data type
    --    to use it as an indexed column in the query # 3
    --    (once there are millions of records, that index will be highly useful)
    
    insert into @Target (Id, [Level], [Name]) 
    select
        Id, 
        [Level] = cast(replace([Level],'.','') as int),
        [Name]
    from
        @Source
    
    
    -- 3. Calculate the ParentId column and update the @Target table 
    --    to use the ParentId as an indexed column in the query # 4
    
    update t set
        ParentId = (
            select top 1 Id 
            from @Target as p
            where p.Id < t.Id and p.[Level] < t.[Level]
            order by p.Id desc )
    from 
        @Target t;
    
    
    -- 4. Calculate the Hid column 
    --    based on the ParentId link and in accordance with the Id order
    
    with Recursion as
    (
        select
            Id       ,
            ParentId ,
            Hid      =  cast(
                            concat(
                                '/',
                                row_number() over (order by  Id),
                                '/'
                            ) 
                            as varchar(1000)
                        )
        from
            @Target
        where
            ParentId is null
    
        union all 
    
        select
            Id        =  t.Id       ,
            ParentId  =  t.ParentId ,
            Hid       =  cast(
                             concat(
                                 r.Hid, 
                                 row_number() over (partition by t.ParentId order by t.Id), 
                                 '/'
                             )
                             as varchar(1000)
                         )
        from 
            Recursion r        
            inner join @Target t on t.ParentId = r.Id 
    )
    update t set
        Hid = r.Hid
    from 
        @Target t
        inner join Recursion r  on r.Id = t.Id;
    
    
    -- 5. See the result ordered by Hid
    
    select 
        Id       ,
        [Level]  ,
        [Name]   ,
        ParentId ,
        Hid      ,
        HidPath  =  Hid.ToString()
    from 
        @Target 
    order by 
        Hid;
    

    Read more about Combination of Id-ParentId and HierarchyId Approaches to Hierarchical Data