The DFS I was taught in school goes something like this:
(* graph representation: ith element of the array is a list of successors of the node i *)
let graph_example = [|
[1; 2];
[3; 0; 2];
[0; 1];
[1]
|]
let dfs graph start_node =
let visited = Array.make (Array.length graph_example) false in
let rec dfs_rec node =
if not visited.(node) then begin
visited.(node) <- true;
(* Do something with node *)
Printf.printf "%d\n" node;
List.iter dfs_rec graph.(node)
end
in
dfs_rec start_node
Example graph:
This version is nice because it is simple, practical and it runs in O(|E| + |V|) (ie linear) time (|E| is the number of edges and |V| is the number of vertices).
However, it mutates the visited
array. From what I understand this is somewhat anti-idiomatic in Ocaml and should be avoided if there is a way to do so. So is there a way to write an immutable version?
I have taken a look at Lhooq's DFS from this question: https://stackoverflow.com/a/37738261
It is immutable, and it essentially (from what I understand) replaces the array with a list dictionary of all the visited nodes, and to check if a node was already visited it uses the function List.mem
.
The trouble is, List.mem
seems to be linear time(I am guessing here, it is not written in the docs, so I could be wrong), while the array access is O(1). so the total complexity would at least be O(|E| + |V|^2) (ie quadratic) time, which is a shame.
So is there a way to improve this or do you have to accept that the immutable version will be slower?
First, using a local mutable variable which is fully owned by a function is idiomatic in OCaml. It is global mutable variables used for long-range communication between various part of a program that should be avoided.
Second, code using List.mem
is not idiomatic, because mem
is a function that morally belongs to set
s.
Thus the straightforward translation of the mutable code would be to replace the visited array by a set of visited nodes:
module Node_set = Set.Make(Int)
let dfs graph start_node =
let rec dfs_rec visited node =
if Node_set.mem node visited then visited else
let visited = Node_set.add node visited in
List.fold_left dfs_rec visited graph.(node)
in
ignore (dfs_rec Node_set.empty start_node)
Then the complexity of the traversal depends on the complexity of the search in the set implementation which is at most at O(ln V) for standard implementation. This time constant can be improved by using specialized data structures. However, this logarithmic complexity is often sufficient. After all, ln V
is at most 64 for a graph that can be handled by the array version of the algorithm whereas the set version could be adapted to work with (sparse and lazy) graph of arbitrary size.