Search code examples
haskellprofiling

Program runs fine with profiling, slower without profiling


I'm observing the following execution times:

  • ~9s when compiled with stack build
  • ~5s when compiled with stack build --profile

I'd expect non-profiled execution time to be below 1s, which tells me that the profiled time above makes sense, it's the non-profiled time that is abnormally slow.

A few more details:

  • The program reads a relational algebra-like DSL, applies a series of rule-based transformations, and outputs a translation to SQL. Parsing is done with megaparsec. I/O is String based and is relatively small (~ 150 KBs). I would exclude I/O as a source of the problem. Transformations involve recursive rewriting rules over an ADT. In a few occasions, I use ugly-memo to speed up such recursive rewrites.
  • Using stack 2.9.1 with LTS 18.28, ghc 8.10.7
    (EDIT: upgrading to LTS 20.11, ghc 9.2.5, does not help)
  • In the cabal file:
  ghc-options:        -O2 -Wall -fno-warn-unused-do-bind -fno-warn-missing-signatures -fno-warn-name-shadowing -fno-warn-orphans
  ghc-prof-options:   -O2 -fprof-auto "-with-rtsopts=-p -V0.0001 -hc -s"
  • Notice that none of the above is new, but I have never observed this behaviour before.
  • I've seen this related question, but compiling with -fno-state-hack does not help
  • Compiling with -O1 doesn't help (about the same as -O2), and -O0 is significantly slower, as expected.
  • The profiling information shows no culprit. The problem only shows up with non-profiled execution.

I realise I'm not giving many details. In particular, I'm not including any code snippet because it's a large code base and I have no idea of which part of it could trigger this behaviour. My point is indeed that I don't know how to narrow it down.

So my question is not "where is the problem?", but rather: what could I do to get closer to the source of the issue, given that the obvious tool (profiling) turns out to be useless in this case?


Solution

  • Though I still don't know the exact reason for the difference between profile and non-profile execution, I now found the culprit in my code.

    A function that recurse over the ADT was implemented as follows:

    f :: (?env :: Env) => X -> Y
    f = memo go
    where
      go :: (?env :: Env) => X -> Y
      go (...) = ...
      go x = f (children x) -- memoised recursion over x's children
    

    Note that:

    • each recursion is memoised, as it the data structure may contain many duplicates
    • the function uses an Env parameter, which is used internally but never changes during the whole recursion. Note that it is declared as an implicit parameter.

    Profiling info about this function showed many more function calls than expected. Three orders of magnitude more than expected! (Still pretty weird that, despite this, profile execution was reasonably fast).

    Turns out, the implicit parameter was the issue. I don't know the exact reason, but I had read blogs about how evil implicit parameters can be and now I have first-hand proof of that.

    Changing this function to using only explicit parameters fixed it completely:

    f :: Env -> X -> Y
    f env = memo (go env)
    where
      go :: Env -> X -> Y
      go env (...) = ...
      go env x = f env (children x) -- memoised recursion over x's children
    

    EDIT: Thanks to @leftaroundabout for pointing out that my solution above isn't very correct - it under-uses memoisation (see discussion below). The correct version should be:

    f :: Env -> X -> Y
    f env = f_env
    where 
      f_env = memo $ go env
      go env (...)
      go env x = f_env (children x)
    

    What the implicit parameter seemed to cause is that uglymemo failed to recognise previously stored calls, so I got all the overhead that comes with it and no benefit.

    My problem is kind of solved, but if someone can explain the underlying reasons for which implicit parameters can cause this behaviour, and why would profile execution be faster, I'd still be very interested to learn that.