Search code examples
sqlpostgresqlaggregatepostgresql-12

How to count related rows including subcategories?


I've got a few tables in a Postgres 12.3 database.

The first one is named category:

id|template_id|name           |entry_count|is_base_template|can_rename|can_delete|section|userId|parentCategoryId|
--|-----------|---------------|-----------|----------------|----------|----------|-------|------|----------------|
 1|           |Notes          |          0|true            |true      |true      |A      |      |                |
 2|           |ToDo           |          0|true            |true      |true      |A      |      |                |
 3|          1|Notes          |          0|false           |true      |true      |A      |     1|                |
 4|          2|ToDo           |          0|false           |true      |true      |A      |     1|                |
 5|           |Must Do        |          0|false           |          |          |A      |      |               4|
 6|           |Important notes|          0|false           |          |          |A      |      |               3|

The second table is called entry - which has no bearing on the question at hand.

And there is the link table category_entries_entry:

categoryId|entryId|
----------|-------|
         4|      1|
         5|      5|
         5|      6|
         4|      7|
         3|      8|
         6|      9|

A category can possess children, if parentCategoryId is not NULL then we're dealing with a child. For instance, the category with id = 5 is a subcategory of id = 4. Children can't have their own children, so only one level of nesting.

I need to count the number of entries for each category including subcategories.

This request does what I need, mostly. But it does not take into account the user:

SELECT COALESCE(c."parentCategoryId" , c.id) as cat_id , COUNT(*) as entries_in_cat
FROM category c JOIN
     category_entries_entry r
     ON c.id = r."categoryId" 
WHERE c.is_base_template = false
GROUP BY cat_id;

Returns:

cat_id|entries_in_cat|
------|--------------|
     4|             4|
     3|             2|

The category table has also userId and the count should only be executed for a given user. Notably, only root categories have entries for userId.

And I want to list subcategories with their counts additionally. So the desired output with userId = 1 for the given sample is:

cat_id|entries_in_cat|
------|--------------|
     5|             2|
     4|             4|
     6|             1|
     3|             2|

Here's a break down:

1) Category number 6 is a subcategory of the 3rd category it has 1 entry, so the result is correct.

2) Category number 3 is a category (that's to say, it does not have a parent), it contains 1 entry and another one should come from the 6th subcategory, that's 2 in total. Your script returns 1 which is wrong.

3) Category number 5 is a subcategory of the 4th category, it contains 2 entries. Your script returns also 2 which is right.

4) category number 4 is a category, it has 2 entries of its own and another two come from the 5th subcategory, that's 4 in total. Your script returns 2 which is wrong. It should return 4.

How can I achieve that?


Solution

  • This does the job for a single level of nesting:

    To list only root categories, counts include subcategories:

    WITH root AS (
       SELECT id AS cat_id, id AS sub_id
       FROM   category
       WHERE  is_base_template = false
       AND    "userId" = 1
       )
    SELECT c.cat_id, count(*)::int AS entries_in_cat
    FROM  (
       TABLE root
       UNION ALL
       SELECT r.cat_id, c.id
       FROM   root     r
       JOIN   category c ON c."parentCategoryId" = r.cat_id
       ) c
    JOIN   category_entries_entry e ON e."categoryId" = c.sub_id
    GROUP  BY c.cat_id;
    

    The point is to join on sub_id, but group by cat_id.

    To list root categories like above, and subcategories additionally:

    WITH root AS (
       SELECT id AS cat_id, id AS sub_id
       FROM   category
       WHERE  is_base_template = false
       AND    "userId" = 1
       )
    , ct AS (
       SELECT c.cat_id, c.sub_id, count(*)::int AS ct
       FROM  (
          TABLE root
          UNION ALL
          SELECT r.cat_id, c.id AS sub_id
          FROM   root     r
          JOIN   category c ON c."parentCategoryId" = r.cat_id
          ) c
       JOIN   category_entries_entry e ON e."categoryId" = c.sub_id
       GROUP  BY c.cat_id, c.sub_id
       )
    SELECT cat_id, sum(ct)::int AS entries_in_cat
    FROM   ct
    GROUP  BY 1
    
    UNION ALL
    SELECT sub_id, ct
    FROM   ct
    WHERE  cat_id <> sub_id;
    

    db<>fiddle here

    For an arbitrary number of nesting levels, use a recursive CTE. Example:

    About the optional short syntax TABLE parent: