Performance Improvements in Haskell

I want to improve my haskell skills of writing really performant code (coming from a C/C++ Background this is important for my ego :D).

So I have written two functions to calculate Pi by the Leibnitz Formula (its not about calculation pi, it was just an example):

calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]
calcPiInf = toinf [0..] 0
    where
        toinf = \(x:xs) l -> do 
            let curr = l + calcPiStep x
            curr:toinf xs curr

calcPiInf constructs a infinite List by recursion. calcPiN with a foldl and a lambda with n iterations.

I have found that calcPiInf is faster than calcPiN AND does not run into a stack overflow for too large numbers. First question: is this just because of lazy evaluation?

Secondly I wrote a corresponding C++ Program:

using namespace std;
double calcPi(int n){
    double pi = 0;
    for(size_t i = 0; i < n; i++){
        pi += 4.0*(i%2 == 0?1:-1)/(2.0*i + 1.0);
    }
    return pi;
}
int main(){
    std::cout.precision(10);
    cout << calcPi(5000000) << endl;
}

Which is far faster than my Haskell Solution. Is it theoretically possible to rewrite my Haskell Code to achieve a similar performance as in C++?

Solution

Use foldl' (from Data.List) instead of foldl (and prefer that variant compared to a lazily generated list)
Use explicit type signatures, or you end up with Integer.
Use optimizations (-O2)

The following code takes ~3.599s on my system (GHC 8.0.2, no optimizations)

calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]

main = print $ calcPiN 5000000

Using foldl' instead of foldl yields ~1.7s (only ~40% of the original time).

import Data.List
calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl' (\p x -> p + calcPiStep x) 0.0 [0..n]

main = print $ calcPiN 5000000

Using type signatures yields ~0.8s, or another 50% reduction. If we now add optimizations, we end up with 0.066s, which is still around twice as slow as the C++ variant (0.033s on my machine with -O3, gcc), but it's almost there.

Note that we could also have used -O2 immediately to get below a single second, but any improvement before adding -O2 often (but not necessarily!) also leads to an improvement afterwards.

Here are all times depending on whether type signatures, foldl' or optimization flags were used. Note that type signatures together with -O2 already bring us to close to C++'s speed. However, that behaviour might not hold in general, and we need to change some functions depending on the lazyness:

Type annotations	`foldl'`	`-O2`	Runtime [s]
yes	no	yes	0.063
yes	yes	yes	0.063
no	yes	yes	0.180
no	no	yes	0.190
yes	yes	no	0.825
no	yes	no	1.700
yes	no	no	2.477
no	no	no	3.599