I have a SQL Server 2008 R2 database table containing hierarchy information in the following format:
MarketID Time menuPath SID MarketName
107397507 2012-11-18 13:00:00.000 \Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements 1 Meeting
107397508 2012-11-18 13:00:00.000 \Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements 1 Plan
145556789 2012-11-20 12:00:00.000 \Project 2\Phase 3\Training\Date 20 November 3 Verbal
145686775 2012-11-20 15:00:00.000 \Project 2\Phase 4\Testing\Date 20 November 3 Structural
145686776 2012-11-20 15:00:00.000 \Project 2\Phase 4\Testing\Date 20 November 3 Optical
The desired hierarchy output is this:
ID ParentID Depth Name MarketID
1 0 0 Project 1 NULL
2 1 1 Phase 1 NULL
3 2 2 Project Mgmt NULL
4 3 3 18 November NULL
5 4 4 Requirements NULL
6 5 5 Meeting 107397507
7 5 5 Plan 107397508
8 0 0 Project 2 NULL
9 8 1 Phase 3 NULL
10 9 2 Training NULL
11 10 3 20 November NULL
12 11 4 12:00 Verbal 145556789
13 8 1 Phase 4 NULL
14 13 2 Testing NULL
15 14 3 20 November NULL
16 15 4 15:00 Structural 145686775
17 15 4 15:00 Optical 145686776
Note: That the word 'Date' has been removed from the node '18 November'
I have been able to create the following procedural SQL query which accomplishes this, but does anyone know how I might convert it to an equivalent set based approach?
/* Begin build of Menu table */
Declare @marketid int
Declare @Time DATETIME
DECLARE @StrMenu NVARCHAR(MAX)
DECLARE @SID INT
DECLARE @StrMarketName NVARCHAR(MAX)
DECLARE @selection VARCHAR(MAX)
DECLARE @parentname VARCHAR(MAX)
DECLARE @parentid INT
DECLARE @depth INT
DECLARE @boolDate INT
DECLARE @EIND INT
DECLARE @Part NVARCHAR(MAX)
DECLARE @IND INT
DECLARE cur CURSOR LOCAL for
SELECT MarketID, Time, menuPath, SID, MarketName FROM test.dbo.Markets
OPEN cur
fetch next from cur into @marketid, @Time, @StrMenu, @SID, @StrMarketName
while @@FETCH_STATUS = 0 BEGIN
SET @IND = CHARINDEX('\',@StrMenu)
-- if the last character is not a \ then append it to the string
IF RIGHT(@StrMenu,1) != '\'
BEGIN
SET @StrMenu = @StrMenu + '\'
END
IF @SID = 3
BEGIN
-- IF SID = 3 then append the Time to the MarketName
SET @StrMarketName = (convert(varchar(5), @Time, 108)) + ' ' + @StrMarketName
END
SET @StrMenu = @StrMenu + @StrMarketName + '\'
Set @EIND = 0
SET @boolDate = 0
SET @depth = 0
WHILE(@IND != LEN(@StrMenu))
BEGIN
SET @EIND = ISNULL(((CHARINDEX('\', @StrMenu, @IND + 1)) - @IND - 1), 0)
SET @selection = (SUBSTRING(@StrMenu, (@IND + 1), @EIND))
IF @depth = 0
BEGIN
SET @parentid = 0
END
IF @depth > 0
BEGIN
SET @parentid = (SELECT TOP 1 ID FROM test.dbo.Menu WHERE NAME = @parentname ORDER BY ID DESC )
END
IF (@selection LIKE '%Date%')
BEGIN
SET @boolDate = 1
SET @selection = REPLACE(@selection, 'Date ', '')
SET @parentid = (SELECT ID FROM test.dbo.Menu WHERE NAME = @parentname )
-- insert values into the menu table
IF NOT EXISTS (SELECT NAME FROM test.dbo.Menu WHERE NAME = @selection AND ParentID = @parentid)
INSERT INTO test.dbo.Menu (ParentID, Depth, Name)
Values (@parentid, @depth, @selection)
END
-- only continue if the selection and its parent combination does not already exist
IF NOT EXISTS (SELECT ID FROM test.dbo.Menu WHERE NAME = @selection AND ParentID = @parentid) AND @boolDate = 0
BEGIN
IF (LEN(@StrMenu) = @EIND + @IND + 1)
BEGIN
-- If the current loop is the last loop then insert the MarketID
INSERT INTO test.dbo.Menu (ParentID, Depth, Name, MarketID)
Values (@parentid, @depth, @selection, @marketid)
END
Else
BEGIN
-- Otherwise only insert the basic info into the menu table
INSERT INTO test.dbo.Menu (ParentID, Depth, Name)
Values (@parentid, @depth, @selection)
END
END
SET @boolDate = 0
-- increment the index values and set the parent name for the next loop
SET @IND = ISNULL(CHARINDEX('\', @StrMenu, @IND + 1), 0)
SET @depth = @depth + 1
SET @parentname = @selection
END
fetch next from cur into @marketid, @Time, @StrMenu, @SID, @StrMarketName
END
close cur
deallocate cur
I have written this SQL to extract the hierarchy information from the menuPath
column. Depending on the SID number the MarketName
information and the Time
columns are also appended to this menuPath
e.g., if SID = 1 then only the MarketName
is appended, but if SID = 3 then the Time
and the MarkeName
are both appended.
The MarketID
is only added to the menu table for MarketName
nodes.
An example of the table schema and the data that I'm working with is below:
USE [test]
GO
CREATE TABLE [dbo].[Markets](
[MarketID] [int] PRIMARY KEY NOT NULL,
[Time] [datetime] NULL,
[menuPath] [varchar](255) NULL,
[SID] [int] NULL,
[MarketName] [varchar](255) NULL
)
CREATE TABLE [dbo].[Menu](
[ID] [int] PRIMARY KEY IDENTITY,
[ParentID] [int] NOT NULL,
[Depth] [int] NOT NULL,
[Name] [varchar] (255) NOT NULL,
[MarketID] [int] NULL
)
INSERT Markets (MarketID, Time, menuPath, SID, MarketName)
SELECT 107397507, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements', 1, 'Meeting'
UNION ALL SELECT 107397508, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements', 1, 'Plan'
UNION ALL SELECT 107397509, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Requirements', 1, 'Write Up'
UNION ALL SELECT 107397513, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Building 1', 1, 'Plan'
UNION ALL SELECT 107397514, '2012-11-18 13:00:00.000', '\Project 1\Phase 1\Project Mgmt\Date 18 November\Building 1', 1, 'Write Up'
UNION ALL SELECT 107397533, '2012-11-19 14:30:00.000', '\Project 1\Phase 1\Project Mgmt\Date 19 November\Building 2', 1, 'Plan'
UNION ALL SELECT 107397537, '2012-11-19 14:30:00.000', '\Project 1\Phase 1\Project Mgmt\Date 19 November\Building 2', 1, 'Write Up'
UNION ALL SELECT 107398573, '2012-11-20 09:00:00.000', '\Project 1\Phase 1\Installation\Date 20 November\Building 3', 1, 'Plan'
UNION ALL SELECT 107398574, '2012-11-20 09:00:00.000', '\Project 1\Phase 1\Installation\Date 20 November\Building 3', 1, 'Write Up'
UNION ALL SELECT 108977458, '2012-11-21 10:00:00.000', '\Project 1\Phase 2\Setup\Date 21 November\Building 4', 1, 'Prep'
UNION ALL SELECT 108977459, '2012-11-21 10:00:00.000', '\Project 1\Phase 2\Setup\Date 21 November\Building 4', 1, 'Clear'
UNION ALL SELECT 145556788, '2012-11-20 12:00:00.000', '\Project 2\Phase 3\Training\Date 20 November', 3, 'Written'
UNION ALL SELECT 145556789, '2012-11-20 12:00:00.000', '\Project 2\Phase 3\Training\Date 20 November', 3, 'Verbal'
UNION ALL SELECT 145686775, '2012-11-21 15:00:00.000', '\Project 2\Phase 4\Testing\Date 21 November', 3, 'Structural'
UNION ALL SELECT 145686776, '2012-11-21 15:00:00.000', '\Project 2\Phase 4\Testing\Date 21 November', 3, 'Optical'
I would use the XML extensions to split the path into its component parsts, you can also use xml extensions to get the position of each xml element, which combined with ROW_NUMBER
(to account for an empty node at the start) gives you your depth
field:
WITH Menus AS
( SELECT m.MarketID,
[Name] = y.value('.', 'nvarchar(max)'),
[Depth] = ROW_NUMBER() OVER(PARTITION BY MarketID ORDER BY y.value('for $i in . return count(../*[. << $i]) + 1', 'int')) - 1
FROM Markets m
CROSS APPLY (VALUES (CAST('<x><y>' + REPLACE(menuPath, '\', '</y><y>') + '</y><y>' + CASE WHEN SID = 3 THEN CONVERT(VARCHAR(5), [Time], 8) + ' ' ELSE '' END + marketName + '</y></x>' AS XML))) a (x)
CROSS APPLY x.nodes('/x/y') b (y)
WHERE y.value('.', 'nvarchar(max)') != ''
)
SELECT MarketID,
[Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
Depth
FROM Menus
Example of Split on SQL Fiddle
Just as a side note your table structure will end up with redundant information, depth could be obtained by counting the number of recursions to get back to a top level parent, or if marketID was stored in all rows the top level parent could be obtained by finding depth = 0. So the output of the above query should give you everything you need. But I'll continue none the less.
The first step will be to insert all items into the menu table with 0
as the parent.
WITH Menus AS
( SELECT m.MarketID,
[Name] = y.value('.', 'nvarchar(max)'),
[Depth] = ROW_NUMBER() OVER(PARTITION BY MarketID ORDER BY y.value('for $i in . return count(../*[. << $i]) + 1', 'int')) - 1
FROM Markets m
CROSS APPLY (VALUES (CAST('<x><y>' + REPLACE(menuPath, '\', '</y><y>') + '</y><y>' + marketName + '</y></x>' AS XML))) a (x)
CROSS APPLY x.nodes('/x/y') b (y)
WHERE y.value('.', 'nvarchar(max)') != ''
)
INSERT Menu (ParentID, Depth, Name, MarketID)
SELECT [ParentID] = 0,
Depth,
[Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
MarketID
FROM Menus
Then update the market table with the right parent IDs
UPDATE Menu
SET ParentID = p.ID
FROM Menu c
INNER JOIN
( SELECT ID, MarketID, Depth
FROM Menu
) p
ON c.MarketID = p.MarketID
AND c.Depth = p.Depth + 1
The final step is to set the Market ID to null for all but the base menu:
WITH CTE AS
( SELECT *,
[maxDepth] = MAX(Depth) OVER(PARTITION BY MarketID)
FROM Menu
)
UPDATE CTE
SET MarketID = NULL
WHERE MaxDepth != Depth;
And voila, you have your desired result.
ADDENDUM
This seems to work:
CREATE TABLE #TempMenu (MarketID INT, Name VARCHAR(200) NOT NULL, Depth INT NOT NULL);
WITH Menus AS
( SELECT m.MarketID,
[Name] = y.value('.', 'nvarchar(max)'),
[Depth] = ROW_NUMBER() OVER(PARTITION BY MarketID ORDER BY y.value('for $i in . return count(../*[. << $i]) + 1', 'int')) - 1
FROM Markets m
CROSS APPLY (VALUES (CAST('<x><y>' + REPLACE(menuPath, '\', '</y><y>') + '</y><y>' + CASE WHEN SID = 3 THEN CONVERT(VARCHAR(5), [Time], 8) + ' ' ELSE '' END + marketName + '</y></x>' AS XML))) a (x)
CROSS APPLY x.nodes('/x/y') b (y)
WHERE y.value('.', 'nvarchar(max)') != ''
)
INSERT #TempMenu (MarketID, name, Depth)
SELECT MarketID,
[Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
Depth
FROM Menus;
CREATE TABLE #TempPaths
( ID INT NOT NULL,
ParentID INT NOT NULL,
Depth INT NOT NULL,
Name VARCHAR(200) NOT NULL,
MarketID INT NULL,
ParentPath VARCHAR(200) NULL,
CurrentPath VARCHAR(200) NULL
);
WITH Paths AS
( SELECT MarketID,
[Name] = CASE WHEN LEFT(Name, 5) = 'Date ' THEN STUFF(Name, 1, 5, '') ELSE Name END,
Depth,
[MaxDepth] = MAX(Depth) OVER(PARTITION BY MarketID),
[ParentPath] = ( SELECT '/' + Name
FROM #TempMenu p
WHERE p.MarketID = c.MarketID
AND p.Depth < c.Depth
FOR XML PATH(''), TYPE
).value('.', 'nvarchar(max)'),
[CurrentPath] = ( SELECT '/' + Name
FROM #TempMenu p
WHERE p.MarketID = c.MarketID
AND p.Depth <= c.Depth
FOR XML PATH(''), TYPE
).value('.', 'nvarchar(max)')
FROM #TempMenu c
), Paths2 AS
( SELECT DISTINCT [ParentID] = 0, Depth, Name, [MarketID] = NULL, [ParentPath], [CurrentPath]
FROM Paths
WHERE MaxDepth != Depth
UNION
SELECT 0, Depth, Name, MarketID, [ParentPath], [CurrentPath]
FROM Paths
WHERE MaxDepth = Depth
)
-- USE MERGE CONDITION THAT WILL NEVER MATCH, ALLOWS ACCESS TO VALUES NOT BEING INSERTED IN THE OUTPUT CLAUSE
MERGE INTO Menu m USING Paths2 p ON 1 = 0
WHEN NOT MATCHED THEN
INSERT (ParentID, Depth, Name, MarketID)
VALUES (p.ParentID, p.Depth, p.Name, p.MarketID)
OUTPUT inserted.ID, inserted.ParentID, inserted.Depth, inserted.Name, inserted.MarketID, p.ParentPath, p.CurrentPath INTO #TempPaths;
UPDATE Menu
SET ParentID = rel.ParentID
FROM Menu
INNER JOIN
( SELECT [ChildID] = c.ID, [ParentID] = p.ID
FROM #TempPaths c
INNER JOIN #TempPaths p
ON c.ParentPath = p.CurrentPath
) rel
ON rel.ChildID = Menu.ID;
DROP TABLE #TempMenu, #TempPaths;
To explain roughly what is going on I have used the same method as above to split the paths into their component parts, putting these into a temp table (for performance reasons), the split paths are then combined into full paths again and put into another temporary table, these paths will be used later to match parent and child records.
The next part uses the merge statement to insert the records to the menu table, this is used because the Identities from the insert need to be matched to the full paths, and when using INSERT
the OUTPUT
only allows access to the inserted values, and not other columns from the source.
Finally once all the records are inserted the temp table of paths can be used to match parent and child records (based on the path), and update the menu table.
This seems quite a convoluted method, but it is entirely set based so should out perform the procedural approach.