Search code examples
phpalgorithmchartsgraph-theoryundirected-graph

Most efficient algorithm to count all paths which require N steps in an Undirected Graph


Consider the following graph:

dummy graph

Represented by the following array structure:

$graph = array
(
    'a' => array(),
    'b' => array('a'),
    'c' => array('a', 'b'),
    'd' => array('a'),
    'e' => array('d'),
    'f' => array('a', 'b', 'c', 'd'),
    'g' => array('d'),
    'h' => array('c'),
    'i' => array('c', 'g'),
    'j' => array(),
);

What is the most efficient algorithm to find all paths (not just the shortest one) from node X to node Y in either direction without repeating nodes? For instance, the paths that lead from node C to node A are:

C --> A
C --> B --> A
C --> F --> A
C --> F --> B --> A
C --> F --> D --> A
C --> I --> G --> D --> A

Finding all the paths using the parent nodes of node X (A and B, in the example for node C) is trivial, but I am having a really hard time traversing the nodes in a descendant / hybrid direction.


UPDATE: Following @JackManey advice, I tried to port IDDFS (Iterative Deepening Depth-First Search) based on the Wikipedia pseudo-code and this is more or less what my code looks like:

$graph = directed2Undirected($graph);

function IDDFS($root, $goal) {
    $depth = 0;

    while ($depth <= 2) { // 2 is hard-coded for now
        $result = DLS($root, $goal, $depth);

        if ($result !== false) {
            return $result;
        }

        $depth++;
    }
}

function DLS($node, $goal, $depth) {
    global $graph;

    if (($depth >= 0) && ($node == $goal)) {
        return $node;
    }

    else if ($depth > 0) {
        foreach (expand($node, $graph) as $child) {
            return DLS($child, $goal, $depth - 1);
        }
    }

    else {
        return false;
    }
}

And here are the helper functions used by it:

function directed2Undirected($data) {
    foreach ($data as $key => $values) {
        foreach ($values as $value) {
            $data[$value][] = $key;
        }
    }

    return $data;
}

function expand($id, $data, $depth = 0) {
    while (--$depth >= 0) {
        $id = flatten(array_intersect_key($data, array_flip((array) $id)));
    }

    return array_unique(flatten(array_intersect_key($data, array_flip((array) $id))));
}

function flatten($data) {
    $result = array();

    if (is_array($data) === true) {
        foreach (new RecursiveIteratorIterator(new RecursiveArrayIterator($data)) as $value) {
            $result[] = $value;
        }
    }

    return $result;
}

Calling the above yields weird or incomplete results:

var_dump(IDDFS('c', 'a')); // a -- only 1 path?
var_dump(IDDFS('c', 'd')); // NULL -- can't find this path?!

I think I'm overlooking something from the pseudo-code, but I'm not sure what it is.


I also tried this DFS class that was recommended in another question, although it seems to always find one path from node X to node Y, I can't get it to return all paths (demo for C -> A and C -> D).


Since I don't need to know the path actually taken, only how many paths exist that require n number of steps to get from node X to node Y, I came up with this function (uses directed2Undirected above):

$graph = directed2Undirected($graph);

function Distance($node, $graph, $depth = 0) {
    $result = array();

    if (array_key_exists($node, $graph) === true) {
        $result = array_fill_keys(array_keys($graph), 0);

        foreach (expand($node, $graph, $depth - 1) as $child) {
            if (strcmp($node, $child) !== 0) {
                $result[$child] += $depth;
            }
        }

        $result[$node] = -1;
    }

    return $result;
}

function expand($id, $data, $depth = 0) {
    while (--$depth >= 0) {
        $id = flatten(array_intersect_key($data, array_flip((array) $id)));
    }

    // no array_unique() now!
    return flatten(array_intersect_key($data, array_flip((array) $id)));
}

For Distance('c', $graph, 0), Distance('c', $graph, 1) and Distance('c', $graph, 2) this correctly returns the sum of the distance between C and any other node. The problem is, with Distance('c', $graph, 3) (and higher) it start repeating nodes and returning wrong results:

Array
(
    [a] => 12
    [b] => 9
    [c] => -1
    [d] => 9
    [e] => 3
    [f] => 12
    [g] => 3
    [h] => 3
    [i] => 6
    [j] => 0
)

The index a should only be 6 (3 + 3), since the only ways I can get from C to A using 3 steps are:

C --> F --> B --> A
C --> F --> D --> A

Yet, it seems to be considering two (only?) additional paths that repeat nodes:

C --> A --> C --> A
C --> B --> C --> A
C --> F --> C --> A
C --> H --> C --> A
C --> I --> C --> A

Of course, index a isn't the only wrong one. The problem seems to be expand() but I'm not sure how to fix it, array_diff(expand('c', $graph, $i), expand('c', $graph, $i - 2)) seems to fix this particular error, but that isn't a proper fix.


dummy graph again again, so you don't have to scroll


Solution

  • In general, you can do a depth-first search or a breadth-first search, although neither one is superior to the other (since it's easy to come up with examples for which one is superior to the other).

    Edit: Upon rereading the question and thinking a bit, since you want all paths from C to A, a DFS starting at C would probably make the most sense. Along the way, you'd have to store sequences of edges and throw sequences away if they don't end up at A.