I'm working on a SSIS project to build a datawarehouse.
Here is my simple SSIS process built to load a dimension table (Locations). Location table contains a parent/child structure based on Nested sets (LEFT,RIGHT, LEVEL fields). So, Insert process works as well as update process.
However, I need to fill a calculated column (PARENT_ID) in order to store the parent relationship. So, for each row I need to use (LEFT, RIGHT and LEVEL columns) in a SQL statement to find the PARENT_ID value).
This is table structure of the source table :
So what is the best way to populate my parent_id column during loading process ? Should I build a dedicated data flow to this process ?
Can I use a SQL statement in a Derived column component to find the parent_id value ?
EDIT : Here is my SQL query to update my parent/child structure (parent_id field).
UPDATE
L1
SET L1.[PARENT_ID] = (
SELECT TOP 1 L2.ID
FROM [DW].[DimLocation] L2
WHERE L2.[TREE_LEFT_VALUE] < L1.[TREE_LEFT_VALUE] AND L2.[TREE_RIGHT_VALUE] > L1.[TREE_RIGHT_VALUE]
ORDER BY L2.[TREE_RIGHT_VALUE]-L1.[TREE_RIGHT_VALUE] ASC)
FROM [DW].[DimLocation] L1
WHERE [TREE_LEFT_VALUE] > 1
EDIT 2: Here is a diagram to represent a Nested Set structure and how to represent it in a database.
So, each node contains TREE_LEFT_VALUE
and TREE_RIGHT_VALUE
values that respectively bound their childs TREE_LEFT_VALUE
/TREE_RIGHT_VALUE
values.
It's made to find in a simple query all children of a specified node. For example if I want all children of the "ROOT 1" node, i need to query all nodes where TREE_LEFT_VALUE
is more than 2 and TREE_RIGHT_VALUE
less than 5.
Because I need to use parent_id structure in my datawarehouse (SSAS project to build an OLAP cube), I need to update a PARENT_ID field by adding the parent node ID. So, I need to query the parent node ID by using TREE_LEFT_VALUE
and TREE_RIGHT_VALUE
fields then to use the result in a update statement.
It works fine, but is there a more efficient way to do that ?
Thanks,
If the column is nullable, I would update it with a SQL task after your data flow. This will insure that all possible parent rows are inserted before you try to find the parentid - also, it will keep the process set based, rather than trying to do it with a lookup in a dataflow task.