Search code examples
sql-server-2005hierarchycoalesce

SQL Query: Hierarchical Coalesce


I have a table that defines a hierarchy:

Create Table [example] (
    id          Integer   Not Null Primary Key,
    parentID    Integer       Null,
    largeData1  nVarChar(max) Null,
    largeData2  nVarChar(max) Null);
    -- largeData3...n also exist

Insert Into [example] (id, parentID, largeData1, largeData2)
Select 1, null, 'blah blah blah', null          Union
Select 2,    1, null,             null          Union
Select 3,    1, 'foo bar foobar', null          Union
Select 4,    3, null,             'lorem ipsum' Union
Select 5,    4, null,             null;

Hierarchy diagram for this data:

Hierarchy diagram

I want to write a query that will return a single row for any given [id] value. The row should contain that row's [id] and [parentID] information. It should also contain the [largeData1...n] fields. However, if a largeData field is null, then it should traverse up the hierarchy until a non-null value for that field is encountered. It should, in short, function like the coalesce function, except across a hierarchy of rows instead of a set of columns.

Example:

Where [id] = 1:

id:          1
parentID:    null
largeData1:  blah blah blah
largeData2:  null

Where [id] = 2

id:          1
parentID:    1
largeData1:  blah blah blah
largeData2:  null

Where [id] = 3

id:          3
parentID:    1
largeData1:  foo bar foobar
largeData2:  null

Where [id] = 4

id:          4
parentID:    3
largeData1:  foo bar foobar
largeData2:  lorem ipsum

Where [id] = 5

id:          5
parentID:    4
largeData1:  foo bar foobar
largeData2:  lorem ipsum

So far, I have this:

Declare @id Integer; Set @id = 5;

With heirarchy
    (id, parentID, largeData1, largeData2, [level])
As (
    Select id, parentID, largeData1,
           largeData2, 1 As [level]
    From example
    Where id = @id

    Union All

    Select parent.id, parent.parentID,
           parent.largeData1,
           parent.largeData2,
           child.[level] + 1 As [level]
    From example As parent
    Inner Join heirarchy As child
        On parent.id = child.parentID)

Select id, parentID,
   (Select top 1 largeData1
    From heirarchy
    Where largeData1 Is Not Null
    Order By [level] Asc) As largeData1,

   (Select top 1 largeData2
    From heirarchy
    Where largeData2 Is Not Null
    Order By [level] Asc) As largeData2

From example
Where [id] = @id;

This returns the results that I am looking for. However, according to the query plan, it is making a separate pass through the hierarchy for each largeData field that I pull back.

How can I make this more efficient?

This is obviously a simplified version of a more complex problem. The final query will return data in XML format, so any solutions involving the FOR XML clause are perfectly fine.

I can create a CLR aggregate function for this, if doing so would help. I have not yet explored that route.


Solution

  • I came up with this:

    DECLARE @Id  int
    
    SET @Id = 5
    
    
    ;WITH cte (Id, ParentId, SaveParentId, LargeData1, LargeData2)
     as (--  The "anchor", your target Id
         select
            ex.Id
           ,ex.ParentId
           ,ex.ParentId  SaveParentId  --  Not changed throughout the CTE
           ,ex.LargeData1
           ,ex.LargeData2
          from Example ex
          where ex.Id = @Id
         union all select
                     cte.Id
                    ,ex.ParentId
                    ,cte.SaveParentId  --  Not changed throughout the CTE
                     --  These next are only "reset" if they are null and a not-null
                     --  value was found at this level 
                    ,isnull(ex.LargeData1, cte.LargeData2)  
                    ,isnull(ex.LargeData2, cte.LargeData2)
          from Example ex
           inner join cte
            on cte.ParentId = ex.Id)
     select
       Id
      ,SaveParentId     ParentId
      ,max(LargeData1)  LargeData1
      ,max(LargeData2)  LargeData2
     from cte
     group by Id, SaveParentId
    

    Basically, start at your target node and walk up the tree, replacing your null columns with not-null values if and when they are found.

    (Sorry, but I don't do XML on weekends.)