Search code examples
mysqlsqlnested-sets

Understanding SQL query for finding item count for each category nested set model


I've been reading this and have understood all the queries until the one for counting items in each category. I have pasted the query and the result here.

    SELECT parent.name, COUNT(product.name)
    FROM nested_category AS node ,
            nested_category AS parent,
            product
    WHERE node.lft BETWEEN parent.lft AND parent.rgt
            AND node.category_id = product.category_id
    GROUP BY parent.name
    ORDER BY node.lft;

    +----------------------+---------------------+
    | name                 | COUNT(product.name) |
    +----------------------+---------------------+
    | ELECTRONICS          |                  10 |
    | TELEVISIONS          |                   5 |
    | TUBE                 |                   2 |
    | LCD                  |                   1 |
    | PLASMA               |                   2 |
    | PORTABLE ELECTRONICS |                   5 |
    | MP3 PLAYERS          |                   2 |
    | FLASH                |                   1 |
    | CD PLAYERS           |                   2 |
    | 2 WAY RADIOS         |                   1 |
    +----------------------+---------------------+

What exactly is going on? There are two copies of the nested_category table named node and parent. In the case of the example, there are 10 items total so there are 10 "nodes" selected since node.category_id = product.category_id. Then there exists at least one parent for each node. But I do not understand the COUNT(product.name) part of the query. How does that give you the item count for each category? I'm not very skilled in sql.

EDIT: I have found a flaw in this code. It will not display categories with zero items. This seems to fix it, but I do not understand it fully either.


Solution

  • Take a look at the GROUP BY clause - this explains the count, I'll run it through:

    Ignoring the actual SELECT for a moment, the first part of the FROM clause is the left hand input nested_category which is aliased as 'parent'

    This is then joined onto itself nested_category as the right hand input and aliased as node, but the clause gives all rows from the right hand input that have left/right values between the left hand inputs left/right values:

    You end up with something like this (selecting all the relevant columns):

    Parent.Id    Parent.Left  Parent.Right  Node.Left  Node.Right
    1             1            10            2          3
    1             1            10            4          9
    1             1            10            5          6
    1             1            10            7          8
    2             4            9             5          6
    2             4            9             7          8
    

    Notice how you get the Parent item multiple times (parent * number of children)

    This would represent the following hierarchy:

      1/10 + 
           |-> 2/3
           |-> 4/9 +
                   |-> 5/6
                   |-> 7/8
    

    Essentially for every row in the left input you get all the child rows in the right input

    The grouping then kicks in - the GROUP BY clause groups by the left hand input (Parent.Name which is probably not the best thing to group on unless Parent.Name is unique!) and then counts the number of rows in the right hand input - so in the case of the above it would give you

    Parent ID 1 (item 1/10) = 4 rows
    Parent ID 2 (item 4/9) = 2 rows
    

    When you GROUP (or aggregate) you can only select items that appear in the GROUP BY clause otherwise you must use an aggregate function such as SUM, COUNT, AVG etc

    Edit: The reason it give you none of the categories with 0 items is that the JOIN says 'only return rows from the right input which have left/right values that are between any of the rows in the left input'. Anything that has nothing between the left/right values will be filtered out by the join

    Hope this helps explain it