Why is the recursive function performing better than the iterative function in elisp?

As a test for one of my classes, our teacher asked us to test a recursive and non-recursive approach to the famous Euclidean Algorithm:

Iterative

(defun gcdi (a b)
  (let ((x a) (y b) r)
    (while (not (zerop y))
      (setq r (mod x y) x y y r))
     x))

Recursive

(defun gcdr (a b)
  (if (zerop b)
     a
     (gcdr b (mod a b))))

And then I ran a test:

(defun test-iterative ()
(setq start (float-time))
   (loop for x from 1 to 100000
     do (gcdi 14472334024676221 8944394323791464)) ; Fibonacci Numbers close to 2^64 >:)
 (- (float-time) start))

(defun test-recursive ()
(setq start (float-time))
   (loop for x from 1 to 100000
     do (gcdr 14472334024676221 8944394323791464)) ; Fibonacci Numbers close to 2^64 >:)
 (- (float-time) start))

And then I ran the timers:

(test-recursive)

: 1.359128475189209

(test-iterative)

: 1.7059495449066162

So my question is this, why did the recursive test perform faster than the iterative test? Isn't iterative almost always better than recursion? Is elisp an exception to this?

Solution

The theoretical answer is that the recursive version is actually tail recursive and thus should compile to iteration.

However, disassembling the functions reveals the truth:

byte code for gcdi:
  args: (a b)
0       varref    a
1       varref    b
2       constant  nil
3       varbind   r
4       varbind   y
5       varbind   x
6       varref    y
7:1     constant  0
8       eqlsign   
9       goto-if-not-nil 2
12      constant  mod
13      varref    x
14      varref    y
15      call      2
16      varset    r
17      varref    y
18      varset    x
19      varref    r
20      dup       
21      varset    y
22      goto      1
25:2    varref    x
26      unbind    3
27      return

byte code for gcdr:
  args: (a b)
0       varref    b
1       constant  0
2       eqlsign   
3       goto-if-nil 1
6       varref    a
7       return    
8:1     constant  gcdr
9       varref    b
10      constant  mod
11      varref    a
12      varref    b
13      call      2
14      call      2
15      return

You can see that the gcdr has almost half as many instructions, but contains two call instructions, which means that ELisp does not, apparently, convert the tail recursive call to iteration. However, function calls in ELisp are relatively cheap and thus the recursive version executes faster.

PS. While the question makes sense, and the answer is actually generally applicable (e.g., the same approach is valid for Python and CLISP, inter alia), one should be aware that choosing the right algorithm (e.g., linearithmic merge-sort instead of quadratic bubble-sort) is much more important than "micro-optimizations" of the implementation.