Search code examples
haskellocamlghcspace-leak

Is there a space leak in this Haskell implementation of LPath?


For fun I'm attempting to write an implementation of the naive longest path algorithm (for finding the length of the longest acyclic path in a cyclic graph). I started with a direct port of the imperative algorithm, which worked and performed fairly well.

data Route = Route {dest:: !Int32, cost:: !Int32}

type Node = [Route]

lPathImperative :: V.Vector Node -> Int32 -> UMV.IOVector Bool -> IO (Int32)
lPathImperative !nodes !nodeID !visited = do
  UMV.write visited (fromIntegral nodeID) True
  max <- newIORef 0
  Prelude.mapM_  (\ Route{dest, cost} -> do
         isVisited <- UMV.read visited (fromIntegral dest)
         case isVisited of
           True -> return ()
           False -> do
               dist <- fmap (+ cost) $ lPathImperative nodes dest visited
               maxVal <- readIORef max
               if dist > maxVal then writeIORef max dist else return ())
     (nodes V.! (fromIntegral nodeID))
  UMV.write visited (fromIntegral nodeID) False
  readIORef max

Where visited is an unboxed mutable vector of bools representing whether each node in the graph has currently been visited, all initialised to false, and nodes is a vector of nodes..

I then attempted to make it more functional, by having max as a value that is passed along in a fold, instead of as an IORef, as follows:

lPathFun :: V.Vector Node -> Int32 -> UMV.IOVector Bool -> IO (Int32)
lPathFun !nodes !nodeID !visited = do
  UMV.write visited (fromIntegral nodeID) True
  let max = CM.foldM acc (0::Int32) (nodes V.! (fromIntegral nodeID))
  UMV.write visited (fromIntegral nodeID) False
  max
    where
      acc :: Int32 -> Route -> IO (Int32)
      acc maxDist Route{dest,cost}  = do
          isVisited <- UMV.read visited (fromIntegral dest)
          case isVisited of
            True -> return maxDist
            False -> do
              dist <- fmap (+ cost) $ lPathFun nodes dest visited
              return $ if dist > maxDist then dist else maxDist

This version however fails to complete, running for minutes (the other took seconds for the same input) before dying with out of memory (requested 1048576 bytes). I would be grateful if somebody could take a look over my code for lPathFun and see what I'm doing wrong. I've tried making everything in it strict, but that didn't help, and also tried making everything lazy, with no change. I even tried changing type node to V.Vector route and using strict foldM' on it instead, to no avail.

I suspect the problem is a space leak. This is because I tried translating lPathFun into OCaml and it worked fine (the fact that the OCaml version uses manual recursion shouldn't make a difference: my functional Haskell version initially used manual recursion too, but suffered the same problems as with using foldM):

type route = {dest: int; cost: int}
type node = route array

let rec lPathFun (nodes: node array) nodeID visited =
  visited.(nodeID) <- true;
  let rec loop i maxDist =
    if i < 0 then maxDist
    else
      let neighbour = nodes.(nodeID).(i) in
      if (not visited.(neighbour.dest))
      then
        let dist = neighbour.cost + lPathFun nodes neighbour.dest visited in
        let newMax = if dist > maxDist then dist else maxDist in
        loop (i-1) newMax
      else
        loop (i-1) maxDist in
  let (max: int) = loop (Array.length nodes.(nodeID) - 1) 0 in
  visited.(nodeID) <- false;
  max;;

The GHC version I'm using is 7.8.3.


Solution

  • The let max = ... looks suspicious here:

    lPathFun !nodes !nodeID !visited = do
      UMV.write visited (fromIntegral nodeID) True
      let max = CM.foldM acc (0::Int32) (nodes V.! (fromIntegral nodeID))
      UMV.write visited (fromIntegral nodeID) False
      max
    

    Your code is equivalent to:

      UMV.write ... True
      UMV.write ... False
      CM.foldM acc ...
    

    but I'm sure you want:

      UMV.write visited ... True
      max <- CM.foldM ...
      UMV.write visited ... False
      return max