Search code examples
sqlpostgresqlrecursive-query

Recursive query to join Hierarchy Level table to count how many times the second level hierarchies appear in the divisision + Zipcode table


So, I have three SQL tables, Hierachy Level and ZipCode.

Example of tables: (Please bear in mind that this is just an example and that in reality there is N numbers of hierarchy levels in the system, so there's no way to write a fixed query)

Hierarchy Level:

ID     HierarchyName   FatherLevel
1      "top level"     NULL
2      "medium"        "top level" 
3      "lowest"        "medium"
4      "top level 2"   NULL
5      "lowest 2"      "top level 2"

Division:

ID    Name            HierachyLevel (I put the names here for ease of understanding but it'd be the IDs)
1     "National"      "top level"
2     "Regional"      "medium"
3     "Northeast"     "lowest"
4     "North"         "lowest"
5     "Southeast"     "lowest"
6     "International" "top level 2"
7     "Europe"        "lowest 2"
7     "Asia"          "lowest 2"

ZipCode:

ID   Code    Division (Same as before, it'd be the IDs instead of the actual names)
1    101     Southeast
2    102     Southeast
3    103     North
4    104     Northeast
5    105     Europe
6    106     Asia

So, what I want here is to see how many times the children of the level just below the top level appears in the ZipCode table appears.

In the example listed, for instance, there's medium appearing 4 times (right below top level, and lowest appearing 2 times (right below top level 2).

My final representation of the query I want to build would be:

   HierarchyName   Count
   "lowest 2"      2
   "medium"        4

This is a pretty loaded question and to be honest I'm not really sure if I'm explaining it well, sorry.

In simpler terms: I want to count how many times the children of the hiearchies right below the top level appear in the Zipcode table through their Divisions.

Edit: Only the lowest level of the hierarchy will appear on the Zipcode table, and what I want to count is how many times the children of the second level would appear on the Zipcode.


Solution

  • This fiddle has a solution.

    First get a count of zip_code by division

    with recursive divs as (
      select d.id as division_id, 
             d.name as division_name, 
             d.hierarchy_level_id,
             count(*) filter (where z.code is not null) as zip_count
        from division d
             left join zip_code z
                    on z.division_id = d.id
       group by d.id, d.name, d.hierarchy_level_id
    ), 
    

    Perform the recursion

    tree as (
      select h.id, h.hierarchy_name, h.father_id, d.division_name, d.zip_count,
             1 as tier, array[h.hierarchy_name] as hier_path
        from hierarchy_level h
             join divs d 
               on d.hierarchy_level_id = h.id
       where h.father_id is null
      union all
      select c.id, c.hierarchy_name, c.father_id, d.division_name, d.zip_count,
             f.tier + 1, f.hier_path||c.hierarchy_name
        from tree f
             join hierarchy_level c
               on c.father_id = f.id
             join divs d 
               on d.hierarchy_level_id = c.id
    )
    

    Join each level to all of its descendants using the @> "contains" operator to get the rolled-up sum() of zip_code values beneath each level.

    select t.id, t.hierarchy_name, t.father_id, t.division_name,
           t.zip_count, t.tier, t.hier_path,
           sum(coalesce(c.zip_count, 0)) + t.zip_count as total_zip_count
      from tree t
           left join tree c
                  on c.tier > t.tier
                 and c.hier_path @> t.hier_path
     group by t.id, t.hierarchy_name, t.father_id, t.division_name,
              t.zip_count, t.tier, t.hier_path
     order by t.hier_path;
    

    I left all the "show your work" information in the fiddle so you can see what is happening. You can cut it down to what you want by changing the select list, adding a where clause, and dropping t.division_name from both the select list and the group by.