Search code examples
sqlsql-serverrecursionaggregate-functionscommon-table-expression

Aggregate functions in a recursive CTE to calculate fractions of sub-groups


I have a self-referential table which I am designing to describe mixtures of ingredients

id raw_input parent_input amount
a x 4
a y 6
b j 1
b k 3
c a 6
c b 1
d c 1
d a 1

I'd like to write a recursive CTE query which calculates the fraction of each base_input in each individual mix. For example, I'd like the output:

id raw_input amount
a x 0.4
a y 0.6
b j 0.25
b k 0.75
c x 0.34285714
c y 0.51428571
c j 0.03571429
c k 0.10714286
  • I haven't added mixture d here as it's quite tricky to calculate at this stage

where the values are calculated as such:

id raw_input amount
a x 4/(4+6)
a y 6/(4+6)
b j 1/4
b k 3/4
c x 0.4*(6/(6+1))
c y 0.6*(6/(6+1))
c j 0.25*(1/(6+1))
c k 0.75*(1/(6+1))

My method for attempting this was to join a aggregate total onto the tables in the CTE, then divide the masses by this as such:

WITH cte AS (
    SELECT id, base_input, mass_fraction FROM
        (SELECT E.id, E.base_input, E.amount/f.total_mass AS mass_fraction
        FROM mix_table E
        JOIN (SELECT id, SUM(amount) as total_mass
             FROM mix_table
             GROUP BY id
             ) AS root_totals
        ON root_totals.id = E.id
        WHERE E.base_input IS NOT NULL) AS r
        UNION ALL
        
        SELECT b.id, base_input, mass_fraction/totals.total_mass FROM 
        (SELECT F.id, cte.base_input, cte.amount/branch_totals.total_mass AS mass_fraction
         FROM mix_table F 
         JOIN cte on F.parent_input = cte.id) as b
         JOIN (SELECT id, SUM(amount) as total_mass
              FROM mix_table
              GROUP BY id 
         ) AS branch_totals
         ON branch_totals.id = totals.id
    )

select * from cte

Running it without the totals joined onto the CTE gets most of the way there, just the individual components of the mixture group C are not scaled by their respective fractions.

It seems like a CTE with an aggregate function is exactly what I want to do, just the error raised by SQL server prevents me doing it.

There must be a way around this, I'm sure I'm not the first person to want to do this.

Edit: I'd like to clarify that I'm aiming to do this for more than one level, where I can expand the solution to account for an indeterminate level of nested parent/children


Solution

  • I managed to solve my own quandry, and for future viewers I think this only works because it doesn't require the aggregation to be run more than once, I can do the aggregation calculation before the CTE, then use it within the CTE:

    -- I first define the total aggregation here
    
    WITH totals AS (SELECT id, SUM(amount) as total_mass
                  FROM test_mix
                  GROUP BY id),
    cte AS (
        SELECT id, raw_input, amount FROM
            (SELECT E.id, E.raw_input, E.amount/totals.total_mass AS amount
            FROM test_mix E
            JOIN totals
            ON totals.id = E.id
            WHERE E.raw_input IS NOT NULL) AS r
            UNION ALL
            
            SELECT b.id, raw_input, child_q*(parent_q/totals.total_mass) FROM 
            (SELECT F.id, cte.raw_input, cte.amount AS parent_q, F.amount AS child_q
             FROM test_mix F 
             JOIN cte on F.parent_input = cte.id) as b
             JOIN totals                                   -- then use it here
             ON totals.id = b.id
        )
    
    SELECT id,raw_input, SUM(amount)
    FROM cte
    GROUP BY id,raw_input