Search code examples
sqlpostgresqlcommon-table-expressionbusiness-intelligence

How can I write a SQL query to calculate the quantity of components sold with their parent assemblies? (Postgres 11/recursive CTE?)


My goal

To calculate the sum of components sold as part of their parent assemblies.

I'm sure this must be a common use case, but I haven't yet found documentation that leads to the result I'm looking for.

Background

I'm running Postgres 11 on CentOS 7. I have some tables like as follows:

CREATE TABLE the_schema.names_categories (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    thing_name TEXT NOT NULL, 
    thing_category TEXT NOT NULL
);

CREATE TABLE the_schema.relator (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    parent_name TEXT NOT NULL, 
    child_name TEXT NOT NULL,
    child_quantity INTEGER NOT NULL 
);

CREATE TABLE the_schema.sales (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),    
    sold_name TEXT NOT NULL,
    sold_quantity INTEGER NOT NULL
);

And a view like so, which is mainly to associate the category key with relator.child_name for filtering:

CREATE VIEW the_schema.relationships_with_child_catetgory AS (
    SELECT 
    r.parent_name,
    r.child_name, 
    r.child_quantity,
    n.thing_category AS child_category
    FROM 
    the_schema.relator r 
    INNER JOIN 
    the_schema.names_categories n 
    ON r.child_name = n.thing_name 
);

And these tables contain some data like this:

INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');

INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);

INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);

I need to construct a query that, given these data, will return something like the following:

 child_name | sum_sold 
------------+----------
 subChild1  |       30
 subChild2  |        6
(2 rows)

The problem is that I haven't the first idea how to go about this and in fact it's getting scarier as I type. I'm having a really hard time visualizing the connections that need to be made, so it's difficult to get started in a logical way. Usually, Molinaro's SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.

Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I'm having considerable difficulty understanding this method and how to use this it for my case.

Aping the example from E. Brandstetter's answer linked above, I arrive at:

WITH RECURSIVE cte AS (
    SELECT 
    s.sold_name,
    r.child_name, 
    s.sold_quantity AS total 
    FROM 
    the_schema.sales s
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r 
    ON s.sold_name = r.parent_name

    UNION  ALL

    SELECT 
    c.sold_name, 
    r.child_name, 
    (c.total  *  r.child_quantity)
    FROM 
    cte c 
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r
    ON r.parent_name = c.child_name
) SELECT * FROM cte 

which gets part of the way there:

 sold_name | child_name | total 
-----------+------------+-------
 parent1   | child1     |     1
 parent1   | child1     |     2
 parent1   | subChild1  |    10
 parent1   | subChild1  |    20
 parent1   | subChild2  |     2
 parent1   | subChild2  |     4
(6 rows)

However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component' to both parts, the query returns no rows:

 sold_name | child_name | total 
-----------+------------+-------
(0 rows)

and when I try to group/aggregate, it gives the following error: ERROR: aggregate functions are not allowed in a recursive query's recursive term

I'm stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I'm failing to comprehend how this recursive CTE works. All guidance is appreciated!


Solution

  • Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE filter and a SUM aggregation afterwards:

    SELECT
        child_name,
        SUM(sold_quantity * child_quantity)
    FROM cte
    WHERE category = 'component'
    GROUP BY child_name
    

    My entire query looks like this (which only differs in the details I mentioned above from yours):

    demo:db<>fiddle

    WITH RECURSIVE cte AS (
        SELECT 
            s.sold_name,
            s.sold_quantity,
            r.child_name,
            r.child_quantity,
            nc.thing_category as category
        FROM 
            sales s
        JOIN relator r
        ON s.sold_name = r.parent_name
        JOIN names_categories nc
        ON r.child_name = nc.thing_name
        
        UNION ALL
        
        SELECT
            cte.sold_name,
            cte.sold_quantity,
            r.child_name,
            r.child_quantity,
            nc.thing_category
        FROM cte
        JOIN relator r ON cte.child_name = r.parent_name
        JOIN names_categories nc
        ON r.child_name = nc.thing_name
    )
    SELECT
        child_name,
        SUM(sold_quantity * child_quantity)
    FROM cte
    WHERE category = 'component'
    GROUP BY child_name
    

    Note: I didn't use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that's just the way I personally like it :)