Search code examples
ocamlimmutabilitydepth-first-search

Depth first search: is immutabilty and speed mutualy exclusive?


The DFS I was taught in school goes something like this:

(* graph representation: ith element of the array is a list of successors of the node i *)
let graph_example = [|
  [1; 2];
  [3; 0; 2];
  [0; 1];
  [1]
|]

let dfs graph start_node =
  let visited = Array.make (Array.length graph_example) false in
  let rec dfs_rec node =
    if not visited.(node) then begin
      visited.(node) <- true;
      (* Do something with node *)
      Printf.printf "%d\n" node;
      List.iter dfs_rec graph.(node)
    end
  in
  dfs_rec start_node

Example graph:

graph_example

This version is nice because it is simple, practical and it runs in O(|E| + |V|) (ie linear) time (|E| is the number of edges and |V| is the number of vertices).

However, it mutates the visited array. From what I understand this is somewhat anti-idiomatic in Ocaml and should be avoided if there is a way to do so. So is there a way to write an immutable version?

I have taken a look at Lhooq's DFS from this question: https://stackoverflow.com/a/37738261 It is immutable, and it essentially (from what I understand) replaces the array with a list dictionary of all the visited nodes, and to check if a node was already visited it uses the function List.mem. The trouble is, List.mem seems to be linear time(I am guessing here, it is not written in the docs, so I could be wrong), while the array access is O(1). so the total complexity would at least be O(|E| + |V|^2) (ie quadratic) time, which is a shame.

So is there a way to improve this or do you have to accept that the immutable version will be slower?


Solution

  • First, using a local mutable variable which is fully owned by a function is idiomatic in OCaml. It is global mutable variables used for long-range communication between various part of a program that should be avoided.

    Second, code using List.mem is not idiomatic, because mem is a function that morally belongs to sets.

    Thus the straightforward translation of the mutable code would be to replace the visited array by a set of visited nodes:

    module Node_set = Set.Make(Int)
    
    let dfs graph start_node =
      let rec dfs_rec visited node =
        if Node_set.mem node visited then visited else
          let visited = Node_set.add node visited in
          List.fold_left dfs_rec visited graph.(node)
      in
      ignore (dfs_rec Node_set.empty start_node)
    

    Then the complexity of the traversal depends on the complexity of the search in the set implementation which is at most at O(ln V) for standard implementation. This time constant can be improved by using specialized data structures. However, this logarithmic complexity is often sufficient. After all, ln V is at most 64 for a graph that can be handled by the array version of the algorithm whereas the set version could be adapted to work with (sparse and lazy) graph of arbitrary size.