Why is there such a massive difference in compile time between consteval/constexpr and template metafunctions?

I was curious how far I could push gcc as far as compile-time evaluation is concerned, so I made it compute the Ackermann function, specifically with input values of 4 and 1 (anything higher than that is impractical):

consteval unsigned int A(unsigned int x, unsigned int y)
{
    if(x == 0)
        return y+1;
    else if(y == 0)
        return A(x-1, 1);
    else
        return A(x-1, A(x, y-1));
}

unsigned int result = A(4, 1);

(I think the recursion depth is bounded at ~16K but just to be safe I compiled this with -std=c++20 -fconstexpr-depth=100000 -fconstexpr-ops-limit=12800000000)

Not surprisingly, this takes up an obscene amount of stack space (in fact, it causes the compiler to crash if run with the default process stack size of 8mb) and takes several minutes to compute. However, it does eventually get there so evidently the compiler could handle it.

After that I decided to try implementing the Ackermann function using templates, with metafunctions and partial specialization pattern matching. Amazingly, the following implementation only takes a few seconds to evaluate:

template<unsigned int x, unsigned int y>
struct A {
    static constexpr unsigned int value = A<x-1, A<x, y-1>::value>::value;
};

template<unsigned int y>
struct A<0, y> {
    static constexpr unsigned int value = y+1;
};

template<unsigned int x>
struct A<x, 0> {
  static constexpr unsigned int value = A<x-1, 1>::value;
};

unsigned int result = A<4,1>::value;

(compile with -ftemplate-depth=17000)

Why is there such a dramatic difference in evaluation time? Aren't these essentially equivalent? I guess I can understand the consteval solution requiring slightly more memory and evaluation time because semantically it consists of a bunch of function calls, but that doesn't explain why this exact same (non-consteval) function computed at runtime only takes slightly longer than the metafunction version (compiled without optimizations).

Why is consteval so slow? And how can the metafunction version be so fast? It's actually not much slower than optimized machine-code.

Solution

In the template version of A, when a particular specialization, say A<2,3>, is instantiated, the compiler remembers this type, and never needs to instantiate it again. This comes from the fact that types are unique, and each "call" to this meta-function is just computing a type.

The consteval function version is not optimized to do this, and so A(2,3) may be evaluated multiple times, depending on the control flow, resulting in the performance difference you observe. There's nothing stopping compilers from "caching" the results of function calls, but these optimizations likely just haven't been implemented yet.