I'm observing the following execution times:
stack build
stack build --profile
I'd expect non-profiled execution time to be below 1s, which tells me that the profiled time above makes sense, it's the non-profiled time that is abnormally slow.
A few more details:
megaparsec
. I/O is String
based and is relatively small (~ 150 KBs). I would exclude I/O as a source of the problem. Transformations involve recursive rewriting rules over an ADT. In a few occasions, I use ugly-memo
to speed up such recursive rewrites. ghc-options: -O2 -Wall -fno-warn-unused-do-bind -fno-warn-missing-signatures -fno-warn-name-shadowing -fno-warn-orphans
ghc-prof-options: -O2 -fprof-auto "-with-rtsopts=-p -V0.0001 -hc -s"
-fno-state-hack
does not help-O1
doesn't help (about the same as -O2
), and -O0
is significantly slower, as expected.I realise I'm not giving many details. In particular, I'm not including any code snippet because it's a large code base and I have no idea of which part of it could trigger this behaviour. My point is indeed that I don't know how to narrow it down.
So my question is not "where is the problem?", but rather: what could I do to get closer to the source of the issue, given that the obvious tool (profiling) turns out to be useless in this case?
Though I still don't know the exact reason for the difference between profile and non-profile execution, I now found the culprit in my code.
A function that recurse over the ADT was implemented as follows:
f :: (?env :: Env) => X -> Y
f = memo go
where
go :: (?env :: Env) => X -> Y
go (...) = ...
go x = f (children x) -- memoised recursion over x's children
Note that:
Env
parameter, which is used internally but never changes during the whole recursion. Note that it is declared as an implicit parameter.Profiling info about this function showed many more function calls than expected. Three orders of magnitude more than expected! (Still pretty weird that, despite this, profile execution was reasonably fast).
Turns out, the implicit parameter was the issue. I don't know the exact reason, but I had read blogs about how evil implicit parameters can be and now I have first-hand proof of that.
Changing this function to using only explicit parameters fixed it completely:
f :: Env -> X -> Y
f env = memo (go env)
where
go :: Env -> X -> Y
go env (...) = ...
go env x = f env (children x) -- memoised recursion over x's children
EDIT: Thanks to @leftaroundabout for pointing out that my solution above isn't very correct - it under-uses memoisation (see discussion below). The correct version should be:
f :: Env -> X -> Y
f env = f_env
where
f_env = memo $ go env
go env (...)
go env x = f_env (children x)
What the implicit parameter seemed to cause is that uglymemo failed to recognise previously stored calls, so I got all the overhead that comes with it and no benefit.
My problem is kind of solved, but if someone can explain the underlying reasons for which implicit parameters can cause this behaviour, and why would profile execution be faster, I'd still be very interested to learn that.