sql-server ssis sql-server-2012 nested-sets

What is the best way to update a calculated derived column during Slowly dimension update?

I'm working on a SSIS project to build a datawarehouse.

Here is my simple SSIS process built to load a dimension table (Locations). Location table contains a parent/child structure based on Nested sets (LEFT,RIGHT, LEVEL fields). So, Insert process works as well as update process.

enter image description here

However, I need to fill a calculated column (PARENT_ID) in order to store the parent relationship. So, for each row I need to use (LEFT, RIGHT and LEVEL columns) in a SQL statement to find the PARENT_ID value).

This is table structure of the source table : enter image description here

So what is the best way to populate my parent_id column during loading process ? Should I build a dedicated data flow to this process ?

Can I use a SQL statement in a Derived column component to find the parent_id value ?

EDIT : Here is my SQL query to update my parent/child structure (parent_id field).

UPDATE 
    L1
    SET L1.[PARENT_ID] = (
    SELECT TOP 1 L2.ID
           FROM [DW].[DimLocation] L2
           WHERE L2.[TREE_LEFT_VALUE] < L1.[TREE_LEFT_VALUE] AND L2.[TREE_RIGHT_VALUE] > L1.[TREE_RIGHT_VALUE]    
           ORDER BY L2.[TREE_RIGHT_VALUE]-L1.[TREE_RIGHT_VALUE] ASC)
FROM [DW].[DimLocation] L1
WHERE [TREE_LEFT_VALUE] > 1

EDIT 2: Here is a diagram to represent a Nested Set structure and how to represent it in a database. enter image description here

So, each node contains TREE_LEFT_VALUE and TREE_RIGHT_VALUE values that respectively bound their childs TREE_LEFT_VALUE/TREE_RIGHT_VALUE values.

It's made to find in a simple query all children of a specified node. For example if I want all children of the "ROOT 1" node, i need to query all nodes where TREE_LEFT_VALUE is more than 2 and TREE_RIGHT_VALUE less than 5.

Because I need to use parent_id structure in my datawarehouse (SSAS project to build an OLAP cube), I need to update a PARENT_ID field by adding the parent node ID. So, I need to query the parent node ID by using TREE_LEFT_VALUE and TREE_RIGHT_VALUE fields then to use the result in a update statement.

It works fine, but is there a more efficient way to do that ?

Thanks,

Solution

If the column is nullable, I would update it with a SQL task after your data flow. This will insure that all possible parent rows are inserted before you try to find the parentid - also, it will keep the process set based, rather than trying to do it with a lookup in a dataflow task.