Search code examples
sqlpostgresqltransitive-closure-tabletransitive-closureleast-common-ancestor

Finding Least Common Ancestor from a Transitive Closure Table


I have a table representing the transitive closure of an organizational hierarchy (i.e., its a tree with a single root):

create table ancestry (
    ancestor   integer,
    descendant integer,
    distance   integer
);

I have another table that contains the organizations that each user is allowed to access:

create table accessible (
    user         integer,
    organization integer
);

The system shows the user a roll-up of expenditures associated with each organization the user can access. I could always start by showing the user a view of the company (i.e., the root) showing the user a list of immediate child organizations and how much his organizations contribute to the total. In most cases, there would be a single child and the user would be required to drill-down several levels before seeing multiple children. I would prefer to start the presentation with the first organization that shows multiple children (i.e., the LCA).

For a given user, I can find the set of paths to the root easy enough but am having trouble finding the least common ancestor. I am using postgresql 9.1 but would prefer a solution that is database agnostic. In the worst case, I can pull the paths to root back into the application's code and calculate the LCA there.


Solution

  • I took a fresh look at this and developed the following solution. I used a common-table-expression to make it easier to understand how it operates but it could easily be written using a sub-query.

    with
    hit (id, count) as (
        select
            ancestry.ancestor
           ,count(ancestry.descendant)
        from
            accessible
            inner join ancestry
                on accessible.organization = ancestry.descendant
        where
            accessible.user = @user_id
        group by
            ancestry.ancestor
    )
    select
        ancestry.descendant as lca
    from
        hit
        inner join ancestry
            on ancestry.descendant = hit.id
           and ancestry.ancestor = @company_id
    order by
        hit.count desc
       ,ancestry.distance desc
    limit 1
    ;
    

    The hit CTE counts, for each organization in the hierarchy, the number of paths from a child to the root that traverse the organization. The LCA is then the organization with the most traversals. In the event of a tie, the organization farthest from the root (i.e., max(distance)) is the actual LCA. This is best illustrated with an example.

            A
            |
            B
           / \
          C   D
    

    Assuming we wish to find the LCA of nodes C and D from the tree above. The hit CTE produces the following counts:

    Node    Count
      A       2
      B       2
      C       1
      D       1
    

    The main query adds the distance:

    Node    Count    Distance
      A       2         0
      B       2         1
      C       1         2
      D       1         2
    

    The main query then orders the results by descending count and distance

    Node    Count    Distance
      B       2         1
      A       2         0
      C       1         2
      D       1         2
    

    The LCA is the first item in the list.