performance list prolog permutation tail-recursion

Prolog performance and recursion type

I was playing with permutation in a couple of programs and stumbled upon this little experiment:

Permutation method 1:

permute([], []).
permute([X|Rest], L) :-
    permute(Rest, L1),
    select(X, L, L1).

Permutation method 2:

permute([], []).
permute(L, [P | P1]) :-
    select(P, L, L1),
    permute(L1, P1).

Permutation method 3 (use the built-in):

permute(L, P) :- permutation(L, P).

I understand that it's good practice to use tail recursion, and generally using built-ins is supposed to be efficient. But when I run the following:

time(findall(P, permute([1,2,3,4,5,6,7,8,9], P), L)).

I got the following results, which are relatively consistent across several runs:

Method 1:

% 772,064 inferences, 1.112 CPU in 2.378 seconds (47% CPU, 694451 Lips)

Method 2:

% 3,322,118 inferences, 2.126 CPU in 4.660 seconds (46% CPU, 1562923 Lips)

Method 3:

% 2,959,245 inferences, 1.967 CPU in 4.217 seconds (47% CPU, 1504539 Lips)

So the non-tail recursive method is quite significantly more real-time efficient.

Is a particular recursion type generally more real time efficient, all other things being equal (I know that's not always a simple premise)? What this experiment tells me is that I may not want to be always striving for tail recursion, but I may need to do a performance analysis first, then weigh performance benefit against other benefits that tail recursion does have.

Solution

Really nice question, +1!

Tail call (and, as a special case, tail recursion) optimization only applies if the predicate is deterministic! This is not the case here, so your predicate will always require the local stack space, no matter in which order you place the goals. The non-tail recursive version is more (time-)efficient here when generating all solutions because it needs to do fewer unifications on backtracking.

EDIT: I am expanding on this point since it is well worth studying the performance difference in more detail.

First, for clarity, I rename the two different versions to make clear which version I am talking about:

Variant 1: Non-tail recursive:

permute1([], []).
permute1([X|Rest], L) :-
    permute1(Rest, L1),
    select(X, L, L1).

Variant 2: Tail-recursive:

permute2([], []).
permute2(L, [P|P1]) :-
    select(P, L, L1),
    permute2(L1, P1).

Note again that, although the second version is clearly tail recursive, tail call (and hence also tail recursion) optimisation only helps if the predicate is deterministic, and hence cannot help when we generate all permutations, because choice points are still left in that case.

Note also that I am deliberately retaining the original variable naming and main predicate name to avoid introducing more variants. Personally, I prefer a naming convention that makes clear which variables denote lists by appending an s to their names, in analogy to regular English plural. Also, I prefer predicate names that more clearly exhibit the (at least intended and desirable) declarative, relational nature of the code, and recommend to avoid imperative names for this reason.

Consider now unfolding the first variant and partially evaluating it for a list of 3 elements. We start with a simple goal:

?- Xs = [A,B,C], permute1(Xs, L).

and then gradually unfold it by plugging in the definition of permute1/2, while making all head unifications explicit. In the first iteration, we obtain:

?- Xs = [A,B,C], Xs1 = [B,C], permute1(Xs1, L1), select(A, L, L1).

I am marking the head unifications in bold.

Now, still one goal of permute1/2 is left. So we repeat the process, again plugging in the predicate's only applicable rule body in place of its head:

?- Xs = [A,B,C], Xs1 = [B,C], Xs2 = [C], permute1(Xs2, L2), select(B, L1, L2), select(A, L, L1).

One more pass of this, and we obtain:

?- Xs = [A,B,C], Xs1 = [B,C], Xs2 = [C], select(C, L2, []), select(B, L1, L2), select(A, L, L1).

This is what the original goal looks like if we just unfold the definition of permute1/2 repeatedly.

Now, what about the second variant? Again, we start with a simple goal:

?- Xs = [A,B,C], permute2(Xs, Ys).

One iteration of unfolding permute2/2 yields the equivalent version:

?- Xs = [A,B,C], Ys = [P|P1], select(P, Xs, L1), permute2(L1, P1).

and a second iteration yields:

?- Xs = [A,B,C], Ys = [P|P1], select(P, Xs, L1),  Ys1 = [P1|P2], select(P1, L1, L2), permute2(L2, P2).

I leave the third and last iteration as a simple exercise that I strongly recommend you do.

And from this it is clear what we initially probably hadn't expected: A big difference lies in the head unifications, which the first version performs deterministically right at the start, and the second version performs over and over on backtracking.

This famous example nicely shows that, somewhat contrary to common expectation, tail recursion can be quite slow if the code is not deterministic.