Search code examples
sqlsql-server-2008-r2hierarchyproceduralset-based

How do I convert this procedural SQL code for generating a menu hierarchy to a SET based approach?


I have a SQL Server 2008 R2 database table containing hierarchy information in the following format:

MarketID    Time                    menuPath                                                       SID  MarketName
107397507   2012-11-18 13:00:00.000 \Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements   1   Meeting
107397508   2012-11-18 13:00:00.000 \Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements   1   Plan
145556789   2012-11-20 12:00:00.000 \Project 2\Phase 3\Training\Date 20 November                    3   Verbal
145686775   2012-11-20 15:00:00.000 \Project 2\Phase 4\Testing\Date 20 November                     3   Structural
145686776   2012-11-20 15:00:00.000 \Project 2\Phase 4\Testing\Date 20 November                     3   Optical

The desired hierarchy output is this:

ID  ParentID    Depth   Name          MarketID
1   0           0       Project 1        NULL
2   1           1       Phase 1          NULL
3   2           2       Project Mgmt     NULL
4   3           3       18 November      NULL
5   4           4       Requirements     NULL
6   5           5       Meeting          107397507
7   5           5       Plan             107397508
8   0           0       Project 2        NULL
9   8           1       Phase 3          NULL
10  9           2       Training         NULL
11  10          3       20 November      NULL
12  11          4       12:00 Verbal     145556789
13  8           1       Phase 4          NULL
14  13          2       Testing          NULL
15  14          3       20 November      NULL
16  15          4       15:00 Structural 145686775
17  15          4       15:00 Optical    145686776

Note: That the word 'Date' has been removed from the node '18 November'

  • Edit: Only unique parent-child nodes are outputted e.g., there is only one 'Project 1\Phase 1' node, but there are two '20 November' nodes: one is 'Training\20 November' and the other is 'Testing\20 November'.
  • Edit: For all nodes with SID = 3 the Time must be added to the last node e.g., 'Testing\20 November\15:00 Optical'
  • Edit: The exact depth contained in the menuPath field can vary.

I have been able to create the following procedural SQL query which accomplishes this, but does anyone know how I might convert it to an equivalent set based approach?

/* Begin build of Menu table */
Declare @marketid int
Declare @Time DATETIME

DECLARE @StrMenu NVARCHAR(MAX)
DECLARE @SID INT

DECLARE @StrMarketName NVARCHAR(MAX)

DECLARE @selection VARCHAR(MAX)
DECLARE @parentname VARCHAR(MAX)
DECLARE @parentid INT
DECLARE @depth INT
DECLARE @boolDate INT
DECLARE @EIND INT 

DECLARE @Part NVARCHAR(MAX)
DECLARE @IND    INT

DECLARE cur CURSOR LOCAL for
    SELECT MarketID, Time, menuPath, SID, MarketName FROM test.dbo.Markets

OPEN cur
fetch next from cur into @marketid, @Time, @StrMenu, @SID, @StrMarketName

while @@FETCH_STATUS = 0 BEGIN
    SET @IND = CHARINDEX('\',@StrMenu)
    -- if the last character is not a \ then append it to the string
    IF RIGHT(@StrMenu,1) != '\'
        BEGIN
            SET @StrMenu = @StrMenu + '\'
        END

    IF @SID = 3
        BEGIN
        -- IF SID = 3 then append the Time to the MarketName
            SET @StrMarketName = (convert(varchar(5), @Time, 108)) + ' ' + @StrMarketName
        END

    SET @StrMenu = @StrMenu + @StrMarketName + '\'

    Set @EIND = 0
    SET @boolDate = 0
    SET @depth = 0

    WHILE(@IND != LEN(@StrMenu))
        BEGIN
            SET  @EIND = ISNULL(((CHARINDEX('\', @StrMenu, @IND + 1)) - @IND - 1), 0)
            SET @selection = (SUBSTRING(@StrMenu, (@IND  + 1),  @EIND))

            IF @depth = 0
                BEGIN
                    SET @parentid = 0
                END
            IF @depth > 0
                BEGIN
                    SET @parentid = (SELECT TOP 1 ID FROM test.dbo.Menu WHERE NAME = @parentname ORDER BY ID DESC )
                END

            IF (@selection LIKE '%Date%')
                BEGIN
                    SET @boolDate = 1
                    SET @selection = REPLACE(@selection, 'Date ', '')
                    SET @parentid = (SELECT ID FROM test.dbo.Menu WHERE NAME = @parentname )
                    -- insert values into the menu table
                    IF NOT EXISTS (SELECT NAME FROM test.dbo.Menu WHERE NAME = @selection AND ParentID = @parentid)
                        INSERT INTO test.dbo.Menu (ParentID, Depth, Name)
                        Values (@parentid, @depth, @selection)
                END

            -- only continue if the selection and its parent combination does not already exist
            IF NOT EXISTS (SELECT ID FROM test.dbo.Menu WHERE NAME = @selection AND ParentID = @parentid) AND @boolDate = 0
                BEGIN                   
                    IF (LEN(@StrMenu) = @EIND + @IND + 1)
                        BEGIN
                            -- If the current loop is the last loop then insert the MarketID
                            INSERT INTO test.dbo.Menu (ParentID, Depth, Name, MarketID)
                            Values (@parentid, @depth, @selection, @marketid)
                        END
                    Else
                        BEGIN
                            -- Otherwise only insert the basic info into the menu table
                            INSERT INTO test.dbo.Menu (ParentID, Depth, Name)
                            Values (@parentid, @depth, @selection)
                        END
                END

            SET @boolDate = 0
            -- increment the index values and set the parent name for the next loop
            SET @IND = ISNULL(CHARINDEX('\', @StrMenu, @IND + 1), 0)
            SET @depth = @depth + 1
            SET @parentname = @selection
        END
    fetch next from cur into @marketid, @Time, @StrMenu, @SID, @StrMarketName
END

close cur
deallocate cur

I have written this SQL to extract the hierarchy information from the menuPath column. Depending on the SID number the MarketName information and the Time columns are also appended to this menuPath e.g., if SID = 1 then only the MarketName is appended, but if SID = 3 then the Time and the MarkeName are both appended.

The MarketID is only added to the menu table for MarketName nodes.

An example of the table schema and the data that I'm working with is below:

    USE [test]
GO

CREATE TABLE [dbo].[Markets](
    [MarketID] [int] PRIMARY KEY NOT NULL,
    [Time] [datetime] NULL,
    [menuPath] [varchar](255) NULL,
    [SID] [int] NULL,
    [MarketName] [varchar](255) NULL
    )

CREATE TABLE [dbo].[Menu](
    [ID] [int] PRIMARY KEY IDENTITY,
    [ParentID] [int] NOT NULL,
    [Depth] [int] NOT NULL,
    [Name] [varchar] (255) NOT NULL,
    [MarketID] [int] NULL
 )

INSERT Markets (MarketID, Time, menuPath, SID, MarketName)
SELECT 107397507, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements', 1, 'Meeting'
UNION ALL SELECT 107397508, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements', 1, 'Plan'
UNION ALL SELECT 107397509, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements', 1, 'Write Up'
UNION ALL SELECT 107397513, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Building 1', 1, 'Plan'
UNION ALL SELECT 107397514, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Building 1', 1, 'Write Up'
UNION ALL SELECT 107397533, '2012-11-19 14:30:00.000', '\Project 1\Phase 1\Project Mgmt\Date 19 November\Building 2', 1, 'Plan'
UNION ALL SELECT 107397537, '2012-11-19 14:30:00.000', '\Project 1\Phase 1\Project Mgmt\Date 19 November\Building 2', 1, 'Write Up'
UNION ALL SELECT 107398573, '2012-11-20 09:00:00.000', '\Project 1\Phase 1\Installation\Date 20 November\Building 3', 1, 'Plan'
UNION ALL SELECT 107398574, '2012-11-20 09:00:00.000', '\Project 1\Phase 1\Installation\Date 20 November\Building 3', 1, 'Write Up'
UNION ALL SELECT 108977458, '2012-11-21 10:00:00.000', '\Project 1\Phase 2\Setup\Date 21 November\Building 4', 1, 'Prep'
UNION ALL SELECT 108977459, '2012-11-21 10:00:00.000', '\Project 1\Phase 2\Setup\Date 21 November\Building 4', 1, 'Clear'
UNION ALL SELECT 145556788, '2012-11-20 12:00:00.000', '\Project 2\Phase 3\Training\Date 20 November', 3, 'Written'
UNION ALL SELECT 145556789, '2012-11-20 12:00:00.000', '\Project 2\Phase 3\Training\Date 20 November', 3, 'Verbal'
UNION ALL SELECT 145686775, '2012-11-21 15:00:00.000', '\Project 2\Phase 4\Testing\Date 21 November', 3, 'Structural'
UNION ALL SELECT 145686776, '2012-11-21 15:00:00.000', '\Project 2\Phase 4\Testing\Date 21 November', 3, 'Optical'

Solution

  • I would use the XML extensions to split the path into its component parsts, you can also use xml extensions to get the position of each xml element, which combined with ROW_NUMBER (to account for an empty node at the start) gives you your depth field:

    WITH Menus AS
    (   SELECT  m.MarketID,
                [Name] = y.value('.', 'nvarchar(max)'),
                [Depth] = ROW_NUMBER() OVER(PARTITION BY MarketID ORDER BY y.value('for $i in . return count(../*[. << $i]) + 1', 'int')) - 1
        FROM    Markets m
                CROSS APPLY (VALUES (CAST('<x><y>' + REPLACE(menuPath, '\', '</y><y>') + '</y><y>' + CASE WHEN SID = 3 THEN CONVERT(VARCHAR(5), [Time], 8) + ' ' ELSE '' END +  marketName + '</y></x>' AS XML))) a (x)
                CROSS APPLY x.nodes('/x/y') b (y)
        WHERE   y.value('.', 'nvarchar(max)') != ''
    )
    SELECT  MarketID,
            [Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
            Depth
    FROM    Menus
    

    Example of Split on SQL Fiddle

    Just as a side note your table structure will end up with redundant information, depth could be obtained by counting the number of recursions to get back to a top level parent, or if marketID was stored in all rows the top level parent could be obtained by finding depth = 0. So the output of the above query should give you everything you need. But I'll continue none the less.

    The first step will be to insert all items into the menu table with 0 as the parent.

    WITH Menus AS
    (   SELECT  m.MarketID,
                [Name] = y.value('.', 'nvarchar(max)'),
                [Depth] = ROW_NUMBER() OVER(PARTITION BY MarketID ORDER BY y.value('for $i in . return count(../*[. << $i]) + 1', 'int')) - 1
        FROM    Markets m
                 CROSS APPLY (VALUES (CAST('<x><y>' + REPLACE(menuPath, '\', '</y><y>') + '</y><y>' + marketName + '</y></x>' AS XML))) a (x)
                CROSS APPLY x.nodes('/x/y') b (y)
        WHERE   y.value('.', 'nvarchar(max)') != ''
    )
    INSERT Menu (ParentID, Depth, Name, MarketID) 
    SELECT  [ParentID] = 0,
            Depth,
            [Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
            MarketID
    FROM    Menus
    

    Then update the market table with the right parent IDs

    UPDATE  Menu
    SET     ParentID = p.ID
    FROM    Menu c
            INNER JOIN 
            (   SELECT  ID, MarketID, Depth
                FROM    Menu 
            ) p
                ON c.MarketID = p.MarketID
                AND c.Depth = p.Depth + 1
    

    The final step is to set the Market ID to null for all but the base menu:

    WITH CTE AS
    (   SELECT  *,  
                [maxDepth] = MAX(Depth) OVER(PARTITION BY MarketID)
        FROM    Menu
    )
    UPDATE  CTE
    SET     MarketID = NULL
    WHERE   MaxDepth != Depth;
    

    And voila, you have your desired result.

    Example on SQL Fiddle


    ADDENDUM

    This seems to work:

    CREATE TABLE #TempMenu (MarketID INT, Name VARCHAR(200) NOT NULL, Depth INT NOT NULL);
    WITH Menus AS
    (   SELECT  m.MarketID,
                [Name] = y.value('.', 'nvarchar(max)'),
                [Depth] = ROW_NUMBER() OVER(PARTITION BY MarketID ORDER BY y.value('for $i in . return count(../*[. << $i]) + 1', 'int')) - 1
        FROM    Markets m
                 CROSS APPLY (VALUES (CAST('<x><y>' + REPLACE(menuPath, '\', '</y><y>') + '</y><y>' + CASE WHEN SID = 3 THEN CONVERT(VARCHAR(5), [Time], 8) + ' ' ELSE '' END +  marketName + '</y></x>' AS XML))) a (x)
                CROSS APPLY x.nodes('/x/y') b (y)
        WHERE   y.value('.', 'nvarchar(max)') != ''
    )
    INSERT #TempMenu (MarketID, name, Depth)
    SELECT  MarketID,
            [Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
            Depth
    FROM    Menus;
    
    CREATE TABLE #TempPaths 
    (   ID          INT NOT NULL, 
        ParentID    INT NOT NULL, 
        Depth       INT NOT NULL, 
        Name        VARCHAR(200) NOT NULL, 
        MarketID    INT NULL, 
        ParentPath  VARCHAR(200) NULL, 
        CurrentPath VARCHAR(200) NULL
    );
    
    WITH Paths AS
    (   SELECT  MarketID,
                [Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
                Depth,
                [MaxDepth] = MAX(Depth) OVER(PARTITION BY MarketID),
                [ParentPath] = (    SELECT  '/' + Name
                                    FROM    #TempMenu p
                                    WHERE   p.MarketID = c.MarketID
                                    AND     p.Depth < c.Depth
                                    FOR XML PATH(''), TYPE
                                ).value('.', 'nvarchar(max)'),
                [CurrentPath] = (   SELECT  '/' + Name
                                    FROM    #TempMenu p
                                    WHERE   p.MarketID = c.MarketID
                                    AND     p.Depth <= c.Depth
                                    FOR XML PATH(''), TYPE
                                ).value('.', 'nvarchar(max)')
        FROM    #TempMenu c
    ), Paths2 AS
    (   SELECT  DISTINCT [ParentID] = 0, Depth, Name, [MarketID] = NULL, [ParentPath], [CurrentPath]
        FROM    Paths
        WHERE   MaxDepth != Depth
        UNION 
        SELECT  0, Depth, Name, MarketID, [ParentPath], [CurrentPath]
        FROM    Paths
        WHERE   MaxDepth = Depth
    )
    -- USE MERGE CONDITION THAT WILL NEVER MATCH, ALLOWS ACCESS TO VALUES NOT BEING INSERTED IN THE OUTPUT CLAUSE
    MERGE INTO Menu m USING Paths2 p ON 1 = 0  
    WHEN NOT MATCHED THEN
        INSERT (ParentID, Depth, Name, MarketID)
        VALUES (p.ParentID, p.Depth, p.Name, p.MarketID)
        OUTPUT inserted.ID, inserted.ParentID, inserted.Depth, inserted.Name, inserted.MarketID, p.ParentPath, p.CurrentPath INTO #TempPaths;
    
    UPDATE  Menu
    SET     ParentID = rel.ParentID
    FROM    Menu
            INNER JOIN
            (   SELECT  [ChildID] = c.ID, [ParentID] = p.ID
                FROM    #TempPaths c
                        INNER JOIN #TempPaths p
                            ON c.ParentPath = p.CurrentPath
            ) rel
                ON rel.ChildID = Menu.ID;
    
    DROP TABLE #TempMenu, #TempPaths;
    

    To explain roughly what is going on I have used the same method as above to split the paths into their component parts, putting these into a temp table (for performance reasons), the split paths are then combined into full paths again and put into another temporary table, these paths will be used later to match parent and child records.

    The next part uses the merge statement to insert the records to the menu table, this is used because the Identities from the insert need to be matched to the full paths, and when using INSERT the OUTPUT only allows access to the inserted values, and not other columns from the source.

    Finally once all the records are inserted the temp table of paths can be used to match parent and child records (based on the path), and update the menu table.

    This seems quite a convoluted method, but it is entirely set based so should out perform the procedural approach.