Search code examples
mysqlsqlrecursive-querychildren

How to SUM all subchildren tree in SQL recursively?


Good day, I've been pulling my hair on this problem for a while ><"

I have 4 categories in a tree structure.

tenant_category_transaction_view:

enter image description here

I would like to have the sum of all children "sumSubtotal" on every category

Something like that: enter image description here

I've managed to come very close... But there's something I don't get ><"

with recursive cte (sumSubtotal, sumQuantity, id, idParentCategory, treeSum, depth) as (

        select  root.sumSubtotal, -- STEP 1
                root.sumQuantity, 
                root.id, 
                root.idParentCategory, 
                root.sumSubtotal as treeSum,
                0 as depth
        from    tenant_category_transaction_view as root

        union all -- LOOP THROUGH ALL ROOT ROWS AND ADD ROWS TO THE CTE WITH THE INNER JOIN

        select  child.sumSubtotal, -- STEP 3
                child.sumQuantity, 
                child.id, 
                child.idParentCategory, 
                (cte.treeSum + child.sumSubtotal) AS treeSum,
                (cte.depth + 1) AS depth
        from    tenant_category_transaction_view AS child

        inner join cte on child.idParentCategory = cte.id -- STEP 2
)
select sumSubtotal, sumQuantity, id, idParentCategory, treeSum, depth -- STEP 4
from cte

Result of the above query:

enter image description here

It seems I'm generating the correct treeSum but upside down in only one branch

Would you be so kind to give me a hand?

Thank you for your time :)


Solution

  • I've updated the fiddle to include the exact schema / data provided in the question, including the null issues. It also includes an example of my suggested changes.

    The solution basically takes the given data and transforms it internally (in the CTE term nodes) so that the 2 top level category rows link to a common row, with id 0, so that the original logic I provided can be used to treat this as one hierarchical list of categories.

    First, we find all the branch lists recursively. Each branch is identified by a corresponding root. Then, we aggregate the node quantities for each root/branch.

    The fiddle

    WITH RECURSIVE nodes AS (
             SELECT id, COALESCE(idParentCategory, 0) AS idParentCategory
                  , sumSubtotal, sumQuantity
               FROM tenant_category_transaction_view
              UNION
             SELECT 0, null, 0, 0
         )
       , cte AS (
            SELECT t.*, t.id AS root
                 , idParentCategory AS idParentCategory0
                 , sumSubtotal      AS sumSubtotal0
                 , sumQuantity      AS sumQuantity0
              FROM nodes AS t
             UNION ALL
            SELECT t.* , t0.root
                 , t0.idParentCategory0
                 , t0.sumSubtotal0
                 , t0.sumQuantity0
              FROM cte AS t0
              JOIN nodes AS t
                ON t.idParentCategory = t0.id
         )
    SELECT root
         , MIN(idParentCategory0)   AS idParentCategory
         , MIN(sumSubtotal0)        AS sumSubtotal
         , MIN(sumQuantity0)        AS sumQuantity
         , SUM(t1.sumSubtotal)      AS total
      FROM cte AS t1
     GROUP BY root
     ORDER BY root
    ;
    

    The result:

    root idParentCategory sumSubtotal sumQuantity total
    0 null 0 0 9890
    1 0 9800 98 9800
    4 0 20 1 90
    5 4 30 1 70
    6 5 40 1 40

    The setup:

    CREATE TABLE tenant_category_transaction_view (
        id               int primary key
      , idParentCategory int
      , sumSubtotal      int
      , sumQuantity      int
    );
    
    INSERT INTO tenant_category_transaction_view VALUES
        (1, null, 9800, 98)
      , (4, null,   20,  1)
      , (5,    4,   30,  1)
      , (6,    5,   40,  1)
    ;
    

    The following uses a suggested slight adjustment to the original table and data. Instead of the 2 top null parent references for rows with id 1 and 4, add a top row (with id 99, for instance) and let rows with id 1 and 4 refer to that with parent = 99.

    WITH RECURSIVE cte AS (
            SELECT t.*, t.id AS root
              FROM tenant_category_transaction_view AS t
             UNION ALL
            SELECT t.*, t0.root
              FROM cte AS t0
              JOIN tenant_category_transaction_view AS t
                ON t.idParentCategory = t0.id
         )
    SELECT root
         , MIN(t2.idParentCategory) AS idParentCategory
         , MIN(t2.sumSubtotal)      AS sumSubtotal
         , MIN(t2.sumQuantity)      AS sumQuantity
         , SUM(t1.sumSubtotal)      AS total
      FROM cte AS t1
      JOIN tenant_category_transaction_view AS t2
        ON t1.root = t2.id
     GROUP BY root
     ORDER BY root
    ;
    

    The result:

    root idParentCategory sumSubtotal sumQuantity total
    99 null 0 0 9890
    1 99 9800 98 9800
    4 99 20 1 90
    5 4 30 1 70
    6 5 40 1 40

    Additionally, this can be written to aggregate based on t2.id, which is the primary key, allowing slight simplification, due to functional dependence.

    WITH RECURSIVE cte AS (
            SELECT t.*, t.id AS root
              FROM tenant_category_transaction_view AS t
             UNION ALL
            SELECT t.*, t0.root
              FROM cte AS t0
              JOIN tenant_category_transaction_view AS t
                ON t.idParentCategory = t0.id
         )
    SELECT t2.id
         , t2.idParentCategory
         , t2.sumSubtotal
         , t2.sumQuantity
         , SUM(t1.sumSubtotal)      AS total
      FROM cte AS t1
      JOIN tenant_category_transaction_view AS t2
        ON t1.root = t2.id
     GROUP BY t2.id
     ORDER BY t2.id
    ;
    

    Finally, we can remove the last JOIN by carrying other root values within the recursive logic:

    WITH RECURSIVE cte AS (
            SELECT t.*, t.id AS root
                 , idParentCategory AS idParentCategory0
                 , sumSubtotal      AS sumSubtotal0
                 , sumQuantity      AS sumQuantity0
              FROM tenant_category_transaction_view AS t
             UNION ALL
            SELECT t.* , t0.root
                 , t0.idParentCategory0
                 , t0.sumSubtotal0
                 , t0.sumQuantity0
              FROM cte AS t0
              JOIN tenant_category_transaction_view AS t
                ON t.idParentCategory = t0.id
         )
    SELECT root
         , MIN(idParentCategory0)   AS idParentCategory
         , MIN(sumSubtotal0)        AS sumSubtotal
         , MIN(sumQuantity0)        AS sumQuantity
         , SUM(t1.sumSubtotal)      AS total
      FROM cte AS t1
     GROUP BY root
     ORDER BY root
    ;
    

    The setup:

    DROP TABLE IF EXISTS tenant_category_transaction_view;
    CREATE TABLE tenant_category_transaction_view (
        id               int primary key
      , idParentCategory int
      , sumSubtotal      int
      , sumQuantity      int
    );
    
    INSERT INTO tenant_category_transaction_view VALUES
        (99, null,    0,  0)
      , ( 1,   99, 9800, 98)
      , ( 4,   99,   20,  1)
      , ( 5,    4,   30,  1)
      , ( 6,    5,   40,  1)
    ;