Why is factorial calculation much faster in Haskell than in Java

One of the programming problems I have come across involves calculation of factorials of large numbers (of numbers up to 10^5). I have seen a simple Haskell code which goes like this

factorial :: (Eq x, Num x) => x -> x
factorial 0 = 1
factorial a = a * factorial (a - 1)

which implicitly handles the huge numbers and somehow runs faster even without any caching that is involved in the code.

When I tried to solve the problem using Java, I had to use BigInteger to hold the huge numbers and also use iterative version of factorial

public static BigInteger factorialIterative(int n)
{
        if(n == 0 || n == 1) return BigInteger.valueOf(1);
        BigInteger f = BigInteger.valueOf(1);
        for(int i = 1 ; i <= n ;i++)
            f = f.multiply(BigInteger.valueOf(i));
        return f;
}

The above code exceeded the set time limit of execution for the program . I also tried the cached recursive version of factorial

public static BigInteger factorial(int n)
{
     if(cache[n] != null) 
         return cache[n];
     else if(n == 0) 
         return new BigInteger("1");
     else {
         cache[n] = n* factorial(n - 1);
         return cache[n]; 
     }
}

which gave me an out of memory error (probably due to recursion).

My Question is , why are functional programming languages like Haskell better in handling these sorts of problems involving huge numbers? (despite no obvious caching). Is there a way I can make the java code run as fast as the Haskell code?

Solution

Here's a related question: https://softwareengineering.stackexchange.com/q/149167/26988

It seems that in this particular case, you're seeing the difference in optimizations of a pure vs impure function. In Haskell, all functions are pure unless they're doing IO (see link).

I think what's going on is that GHC is able to optimize the code better because of the guarantee of purity. Even though there isn't a tail call, it knows there aren't any side effects (because of purity guarantee), so it can do some optimizations the Java code can't ~~(like automatic caching and whatnot as @andrew mentions in his answer)~~.

A better solution in Haskell would be to use the built-in product function:

factorial n = product [1..n]

This is able to do tail-call optimization because it's just iteration. The same can be done in Java with a for loop as in your example, but it won't have the benefit of being functionally pure.

Edit:

I assumed tail-call elimination was happening, but it apparently is not. Here's the original answer for reference (it still has useful information about why Haskell may be faster that Java in some recursive contexts).

Functional programming languages like Haskell take advantange of tail call elimination.

In most programming languages, recursive calls maintain a call stack. Each recursive function allocates a new stack, which isn't cleaned up until it returns. For example:

call fact()
    call fact()
        call fact()
        cleanup
    cleanup
cleanup

Functional languages, however, don't need to maintain a stack. In procedural languages, it's often difficult to tell if the return value will be used by the caling function, so it's hard to optimize. In FP, however, the return value only makes sense when the recursion is complete, so you can eliminate the call stack and end up with something like this:

call fact()
call fact()
call fact()
cleanup

The call fact() lines can all happen in the same stack frame because the return value isn't needed in intermediate calculations.

Now, to answer your question, you can solve this problem in a variety of ways, all of which aim to eliminate the call stack:

use a for loop instead of recursion (usually the best option)
return void and hope the compiler does tail call elimination
use a trampoline function (similar to the for-loop idea, but it looks more like recursion)

Here are some related questions with examples for the above:

Recursion vs For loops - Factorials, Java (for loop instead of recursion for factorial)
Why does the JVM still not support tail-call optimization? (maybe tail call elimination isn't reliable in Java)
What is a trampoline function? (trampoline function in C)

Note:

It's not guaranteed that recursive calls will reuse the same stack frame, so some implementations may reallocate on each recursive call. This is often easier and still provides the same memory safety as reusing the stack frame.

For more information about this, see these articles: