performance haskell recursion tail-recursion

Improving efficiency in Stirling numbers calculation

I've been trying to explore the possibilities of recursion in Haskell, but I don't know much about improving efficiency when facing large recursions. Today, I wanted to make a function that calculates Stirling numbers of second kind:

factorial :: Integer -> Integer
factorial n = helper n 1
    where
        helper 1 a = a
        helper n a = helper (n-1) (a*n)


stirling :: Integer -> Integer -> Integer
stirling n 1 = 1
stirling n 2 = ((2 ^n)- 2) `div` 2
stirling n k
    | k > n      = 0
    | k == n     = 1
    | k == (n-1) = (factorial n) `div` (factorial 2 * (factorial(n-2)))
    | otherwise  = stirling (n-1) (k-1) + k *(stirling (n-1) k)

I've been trying to create an auxiliary function, just as in the factorial definition, that could avoid the inconvenience of Haskell´s lazy evaluation taking too many memory resources. I want to do the same, calculating the product in each call of the

k *(stirling (n-1) k)

recursion, but I didn't get the hang of it. Any ideas?

P.D: The code already works quite decently, but I'm new with Haskell and I still don't know what the possibilities are. Any improvement is welcome (although I´d rater not use any library; I want to learn the basics before getting into that).

Solution

Your implementation already appears to be fairly memory efficient, at least in my tests. For example:

% time ./test salferdez 32 16
1194461517469807833782085
11.50user 0.00system 0:11.48elapsed 100%CPU (0avgtext+0avgdata 8704maxresident)k
0inputs+0outputs (0major+1267minor)pagefaults 0swaps

Changing the parameters (up or down) doesn't seems to change the 8MB residency much, so I expect that's just about the minimum startup cost of a Haskell program. On the other hand, it seems quite slow. Parameters even slightly larger than the ones I used here -- say, double them both -- took longer than I cared to wait. Simply implementing the formula from Wikipedia directly appears to be quite a lot faster, while retaining a good memory profile:

% time ./test dmwit 32 16
1194461517469807833782076
0.00user 0.00system 0:00.01elapsed 16%CPU (0avgtext+0avgdata 4352maxresident)k
0inputs+0outputs (0major+206minor)pagefaults 0swaps
% time ./test dmwit 10000 5000
<snip'd very very long output>
10.62user 0.02system 0:10.62elapsed 100%CPU (0avgtext+0avgdata 16716maxresident)k
0inputs+0outputs (1major+26643minor)pagefaults 0swaps

(I did not recompile between these runs and the previous ones, so you can be sure we both got the same level of optimization.)

On the other hand, I do get a slightly different answer than you. Perhaps you can spot a bug in one or the other of our code. In any case, here's the formula from wikipedia^*, and here's my take on its most direct translation:

stirling n k = sum [(-1)^(k-i)*i^n`div`(product [1..k-i]*product[1..i]) | i <- [0..k]]

Don't forget to compile with -O2 and let GHC do all the dirty work of thinking about how to make it fast and memory-efficient.

^* StackOverflow does have a mechanism for including the image directly, rather than linking it, but it doesn't seem to be working for me. Not sure why!