Search code examples
graphancestorleast-common-ancestor

Finding best common ancestor of two leaf nodes where nodes have zero, one, or two parents


Goal:

I am looking for an algorithm to find the best common ancestor of a graph where nodes in the graph can have zero, one, or two parents. I am not sure of the terminology of "best common ancestor": better terminology might be "lowest common ancestor", or "recent common ancestor", etc. If there is better terminology then please provide URL's that describe such.

The algorithm has access to the full graph data structure.

It is possible for a given node to have zero, one, or two parents. This is key because the algorithms I've seen on the web assume that a given node has either zero or one parents, but not two parents (see references below). For instance, the m1 node in the diagram below has zero parents as it is the root (there can be multiple roots of the graphs). d3 has two parents, one is d2 and the other b2.

Nodes have references to both parents if they exist, and references to all children, if they exist, so traversal up the tree and down the tree is fair game. Nodes can have zero or more children. Changing the data structure is not an option.

Nodes closer to the two input nodes are preferable than nodes farther away (i.e., closer to roots of the graph).

By example, one possible graph is shown by the diagram given below. In this scenario, the inputs to the algorithm would be nodes b5 and d4. The best common ancestor of nodes b5 and d4 is b2. c2 would not be because b3 is in the lineage leading to b5.

Possible answers for the algorithm can be at most one node, and the empty set is a valid answer in the case that there is no common ancestor of the two input nodes.

Reference Material

Tarjan's off-line least common ancestors algorithm seems to imply zero or one parents, so if that is the solution, then the answer should include a description of how two parents are accounted for in that algorithm. The wikipedia page for Lowest common ancestor also seems to only account for data structures whose nodes have zero or one parents, not two:

In a tree data structure where each node points to its parent, ...

Diagram:

graph diagram


Solution

  • I run a genealogy website, and I've solved this problem before with the following algorithm.

    For both nodes, use recursion to generate an array which links node name with generation. Using your example, b4 is 1 generation above b5; b3 is 2 generations; etc:

    $b5Tree = array('b4'=>1, 'b3'=>2, 'c3'=>2, 'b2'=>3, 'c2'=>3, ...);
    $d4Tree = array('d3'=>1, 'b2'=>2, 'd2'=>2, 'b1'=>3, 'd1'=>3, ...);
    

    The base case is to check if the first node appears in the second's tree, and vice-versa. If it exists, then you have your answer.

    Otherwise, iterate through one of the trees, finding common node IDs, and keeping track of the minimum generation.

    $minNodeID = null;
    foreach ($b5Tree as $nID => $gen)
    {
        if (($d4Tree[$nID] != 0) and (($d4Tree[$nID]  + $gen) < $minSummedGen))
        {
            $minSummedGen = $d4Tree[$nID] + $gen;
            $minNodeID = $nID;
        }
    }
    return $minNodeID;