Search code examples
sqlsql-serversql-server-2005t-sqlhierarchical

What is the best way to group and aggregate and sum tree data?


Given a self referencing table

Item 
-------------
Id (pk)
ParentId (fk)

With a related table of associated values

ItemValue
-------------
ItemId (fk)
Amount

And some sample data

Item                       ItemValues 
Id      ParentId           ItemId      Amount
--------------------       ----------------------
1       null               1           10
2       1                  3           40
3       1                  3           20
4       2                  4           10
5       2                  5           30
6       null
7       6
8       7

I need a sproc to take Item.Id and return the direct children with sums of all ItemValue.Amounts for the them, their children and their children all the way down the tree.

For example, if 1 is passed in, the tree would be 2, 3, 4, 5 the direct children are 2, 3 the output would be

 ItemId    Amount
 ------------------
 2         40     (values from ItemIds 4 & 5)
 3         60     (values from ItemId 3)

What sort of approaches should be applied to make achieve this behavior?

I am considering using a CTE, but am wondering if there is a better/faster approach.


Solution

  • A recursive CTE like this would work, assuming your hierarchy doesn't go too deep:

    declare @ParentId int;
    set @ParentId = 1;
    
    ;with 
      Recurse as (
        select 
          a.Id as DirectChildId
        , a.Id
        from Item a 
        where ParentId = @ParentId
        union all
        select
          b.DirectChildId
        , a.Id
        from Item a 
        join Recurse b on b.Id = a.ParentId
        )
    select
      a.DirectChildId, sum(b.Amount) as Amount
    from Recurse a
    left join ItemValues b on a.Id = b.ItemId
    group by
      DirectChildId;
    

    A non-CTE method would require some form of iteration, cursor-based or otherwise. Since it's a stored proc, its a possibility, and if there's a lot data to recurse through, it would probably scale better, so long as you slice the data appropriately.

    If the clustered index is on Id, add a non-clustered index on ParentId. As a covering index, it will satisfy the initial seek w/out a bookmark lookup. The clustered index will then help with the recursive join.

    If the clustered index is already on ParentId instead, add a non-clustered index on Id. Together, they will be virtually equivalent to the above. For ItemValues, you may want a index on (ItemId) INCLUDE (Amount), if the actual table is wider than this.