Search code examples
performancehaskelllazy-evaluation

Performance Improvements in Haskell


I want to improve my haskell skills of writing really performant code (coming from a C/C++ Background this is important for my ego :D).

So I have written two functions to calculate Pi by the Leibnitz Formula (its not about calculation pi, it was just an example):

calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]
calcPiInf = toinf [0..] 0
    where
        toinf = \(x:xs) l -> do 
            let curr = l + calcPiStep x
            curr:toinf xs curr

calcPiInf constructs a infinite List by recursion. calcPiN with a foldl and a lambda with n iterations.

I have found that calcPiInf is faster than calcPiN AND does not run into a stack overflow for too large numbers. First question: is this just because of lazy evaluation?

Secondly I wrote a corresponding C++ Program:

using namespace std;
double calcPi(int n){
    double pi = 0;
    for(size_t i = 0; i < n; i++){
        pi += 4.0*(i%2 == 0?1:-1)/(2.0*i + 1.0);
    }
    return pi;
}
int main(){
    std::cout.precision(10);
    cout << calcPi(5000000) << endl;
}

Which is far faster than my Haskell Solution. Is it theoretically possible to rewrite my Haskell Code to achieve a similar performance as in C++?


Solution

    1. Use foldl' (from Data.List) instead of foldl (and prefer that variant compared to a lazily generated list)
    2. Use explicit type signatures, or you end up with Integer.
    3. Use optimizations (-O2)

    The following code takes ~3.599s on my system (GHC 8.0.2, no optimizations)

    calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
    calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]
    
    main = print $ calcPiN 5000000
    

    Using foldl' instead of foldl yields ~1.7s (only ~40% of the original time).

    import Data.List
    calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
    calcPiN n = foldl' (\p x -> p + calcPiStep x) 0.0 [0..n]
    
    main = print $ calcPiN 5000000
    

    Using type signatures yields ~0.8s, or another 50% reduction. If we now add optimizations, we end up with 0.066s, which is still around twice as slow as the C++ variant (0.033s on my machine with -O3, gcc), but it's almost there.

    Note that we could also have used -O2 immediately to get below a single second, but any improvement before adding -O2 often (but not necessarily!) also leads to an improvement afterwards.

    Here are all times depending on whether type signatures, foldl' or optimization flags were used. Note that type signatures together with -O2 already bring us to close to C++'s speed. However, that behaviour might not hold in general, and we need to change some functions depending on the lazyness:

    Type annotations foldl' -O2 Runtime [s]
    yes no yes 0.063
    yes yes yes 0.063
    no yes yes 0.180
    no no yes 0.190
    yes yes no 0.825
    no yes no 1.700
    yes no no 2.477
    no no no 3.599