Search code examples
sqlpostgresqlcommon-table-expressionhierarchy

SQL: Join (Inner Join) with matching ancestor from another table


So I'm completely stumped trying to figure out this SQL query.

I have table of product categories that exist in a tree-like structure. To simplify let's say there are 3 top level categories: A, B, C. There is one more category above them ('All') which is the root. No products can be assigned to this category. To distinguish categories that can't be assigned to products, they have a type 'Abstract', as opposed to 'Concrete'

Each category can have any number and depth of sub-categories. I'm currently storing these with a parent id to the immediate parent (adjacency list).

Categories

Category   Parent    Type
All        None      Abstract
A          All       Concrete
B          All       Concrete
C          All       Concrete
D          A         Concrete
E          D         Concrete
F          B         Concrete
G          F         Concrete
H          C         Concrete
I          C         Concrete

I have another table of products with a category field. The only categories that appear in this table are the top level ones. ie. Either A, B, or C.

Products

Part Number       Category
XXXX-XXXX         A
XXXX-YYYY         A
XXXX-ZZZZ         B
YYYY-XXXX         C

I would like to create a query that joins the two tables, to create rows where the Category is replaced with the child category. ie. From a pseudo code standpoint basically join on category = provided category is either equal or a descendent of category.

So something like:

select * from products
inner join categories
on products.category = descendent of category

would result in:

Part Number       Category
XXXX-XXXX         E (E's top level concrete parent is A)
XXXX-YYYY         E (E's top level concrete parent is A)
YYYY-XXXX         H (H's top level concrete parent is C)
YYYY-XXXX         I (I's top level concrete parent is C)

I have this that retrieves all the concrete types up to the top level:

with recursive
concrete_parents as (
  select category, parent, type
  from categories
  where category in ('E', 'H', 'I')
  UNION ALL
    select t2.category, t2.parent, t2.type
    from categories as t2
    inner join concrete_parents t1
    on t1.parent = t2.category
    where t2.type = 'Concrete'
)

select distinct * from concrete_parents
order by parent;

I can't figure out how to combine this with a inner join on the main table?

Another alternative I'm considering is using a Postgres ltree but I'm not very familiar with it.

Any thoughts?


Solution

  • ... would be great to dynamically capture the top level concrete categories.

    That seems doable since you stated:

    The only categories that appear in this table (Products) are the top level ones, ie. A, B, or C.

    So those top level categories are filtered in the final JOIN automatically. And since those (and only those) have parent = 'All' according to your sample data, we can shave off one level of recursion and make it a bit faster, yet:

    WITH RECURSIVE parent_cat AS (
       SELECT category AS original, category, parent -- no need for type
       FROM   categories      c
       WHERE  category in ('A', 'D', 'H', 'I')
    
       UNION ALL
       SELECT pc.original, c.category, c.parent
       FROM   parent_cat pc
       JOIN   categories c ON c.category = pc.parent
       WHERE  pc.parent <> 'All'  -- stop at top level, save 1 recursion
       )
    SELECT p.part_number, pc.category, pc.original 
    FROM   parent_cat pc
    JOIN   products   p USING (category)
    WHERE  pc.parent = 'All'      -- redundant, but a bit faster
    ORDER  BY pc.original;
    

    Also, no need to filter with type = 'Concrete', as other types are filtered by the join as well as by pc.parent = 'All' already.

    db<>fiddle here

    BTW, if performance is critical and categories don't change too much, consider a MATERIALIZED VIEW replacing the rCTE parent_cat in the query - and implement an appropriate regime to keep it up to date.