lambda scheme evaluation sicp continuation-passing

Continuation passing style procedures

I have troubles understanding how procedures in continuation passing style "remember" values from previous function calls.

As an example I have the following procedure that will filter the even values from a list:

(define (get-pairs alist proc)
  (if (null? alist)
      (proc '())
      (get-pairs
       (cdr alist)
       (lambda (l)
         (let ((num (car alist)))
           (if (zero? (remainder num 2))
               (proc (cons num l))
               (proc l)))))))

Then I call it with:

(get-pairs '(1 2)
           (lambda (n) (display n)))

To get the expected result (2).

get-pairs will recursively call itself until its parameter alist is empty. Then the last function call would be: (get-pairs '() proc). proc would be the procedure:

(lambda (l)
         (let ((num (car alist)))
           (if (zero? (remainder num 2))
               (proc (cons num l))
               (proc l))))

In this lambda body, alist and proc are the parameters of the function call that came prior: (get-pairs '(2) proc). My question is, how does each lambda procedure "remembers" the parameters of past function calls if proc is evaluated only at the very end?

Or is it that at each call to get-pairs, the body of the lambda passed as argument to the next call is "analyzed", with the corresponding values of alist and proc already substituted into its body?

Solution

TL;DR: closures created by tail-call-optimized functions must capture a copy of (relevant parts of) their definitional environment. Or, just ignore the TCO part, and treat it as you would a regular recursive function, where any lambda function created during the recursive function's execution is a closure, captures the values of variables that it refers to.

This can be understood in the framework of the environment model of Scheme evaluation.

Each call to (lambda (...) ...) creates a new lambda function object, implicitly paired up with its definitional environment, together known as closure.

Each invocation of get-pairs creates its own fresh new call frame, and any lambdas created from that will hold onto the hidden pointer into (a copy of) that frame.

This can be easier to see with the following variants, which perform exactly the same function as the one in the question:

(define (get-pairs1 alist proc)
  (if (null? alist)
      (proc '())
      (get-pairs1
       (cdr alist)
       (let ((alist alist))   ; creates fresh new environment frame
         (lambda (l)
           (let ((num (car alist)))
             (if (zero? (remainder num 2))
                 (proc (cons num l))
                 (proc l))))))))

(define (get-pairs2 alist proc)
  (if (null? alist)
      (proc '())
      (get-pairs2
       (cdr alist)
       (let* ((alist  alist)
              (num  (car alist))
              (newproc 
               (if (zero? (remainder num 2))
                   (lambda (l) (proc (cons num l)))
                   (lambda (l) (proc l)))))
         newproc))))

proc is not "evaluated at the very end", the procedure which is the variable proc's value is called at the very end, but the variable proc's value is found out at each invocation. And at each invocation that value is different, i.e. new lambda function object is created afresh on each separate invocation of get-pairs. The value of the variable proc at each invocation of get-pairs is different.

So, for the example call (get-pairs2 '(1 2 3 4) display), the final proc's call is the same as

((lambda (l4)             ;                  |
    ((lambda (l3)          ;             |   |
        ((lambda (l2)       ;        |   |   |
            ((lambda (l1)    ;   |   |   |   |
                (display      ;  1   2   3   4
                 l1))         ;  |   |   |   |
             (cons 2 l2)))   ;       |   |   |
         l3))               ;            |   |
     (cons 4 l4)))         ;                 |
 '())

;; i.e.
;;  l1 = cons 2 l2
;;              l2 = l3
;;                   l3 = cons 4 l4
;;                               l4 = '()

Which can also be written, in pseudocode, as

(((((display ∘ identity) ∘ {cons 2}) ∘ identity) ∘ {cons 4}) '())
;   └───────1──────────┘
;  └───────────────2───────────────┘
; └─────────────────────────3──────────────────┘
;└───────────────────────────────────4─────────────────────┘

;; 1: created on 1st invocation of `get-pairs2`
;; 2: created on 2nd invocation of `get-pairs2`
;; 3: created on 3rd invocation of `get-pairs2`
;; 4: created on the final 4th invocation of `get-pairs2`, 
;;    and then called with `'()` as the argument

where {cons n} signifies a partially applied cons, i.e. (lambda (l) (cons n l)), and identity is (lambda (l) l).

Oh, and ∘ stands for the function composition, (f ∘ g) = (lambda (x) (f (g x))).

See also some other of my answers that might be relevant, here and here.

Working through the invocation (get-pairs2 '(1 2 3 4)) step-by-step, with let-based re-writes emulating the function calls, we get (simplifying a bit)

(get-pairs2 '(1 2 3 4) display)
=
(let ((alist '(1 2 3 4))                       ; '(1 2 3 4)
      (proc  display))                         
  (let* ((num (car alist))                       ; 1       
         (newproc (lambda (l) (proc l))))            
    (let ((alist (cdr alist))                    ; '(2 3 4)  
          (proc newproc))                          
      (let* ((num (car alist))                     ; 2   
             (newproc (lambda (l) (proc (cons num l))))) 
        (let ((alist (cdr alist))                  ; '(3 4)
              (proc newproc))         
          (let* ((num (car alist))                   ; 3  
                 (newproc (lambda (l) (proc l))))  
            (let ((alist (cdr alist))                ; '(4)
                  (proc newproc))
              (let* ((num (car alist))                 ; 4
                     (newproc (lambda (l) (proc (cons num l)))))
                (let ((alist (cdr alist))              ; '()
                      (proc newproc))
                  (proc '()))))))))))

Loading it up in DrRacket's code editing window and hovering with your mouse over the various identifiers is a fun game that lets you see what each identifier is referring to. Running this code with Ctrl-R also produces the same results as the the original function call.

Another "fun" exercise is to go over the above nested let expression and manually rename each identifier by adding a unique index to it (changing proc to proc1, proc2 etc.) so each name becomes unique.

Okay, I'll do it for you, especially that DrRacket has a nice "rename identifier" feature which makes it much easier and less error-prone. But do try doing it on your own too.

(let ((alist '(1 2 3 4))                         ; '(1 2 3 4)
      (proc  display))
  (let* ((num (car alist))                         ; 1
         (newproc (lambda (l) (proc l))))
    (let ((alist2 (cdr alist))                     ; '(2 3 4)
          (proc2 newproc))
      (let* ((num2 (car alist2))                     ; 2
             (newproc2 (lambda (l) (proc2 (cons num2 l)))))
        (let ((alist3 (cdr alist2))                  ; '(3 4)
              (proc3 newproc2))
          (let* ((num3 (car alist3))                   ; 3
                 (newproc3 (lambda (l) (proc3 l))))
            (let ((alist4 (cdr alist3))                ; '(4)
                  (proc4 newproc3))
              (let* ((num4 (car alist4))                 ; 4
                     (newproc4 (lambda (l) (proc4 (cons num4 l)))))
                (let ((alist5 (cdr alist4))              ; '()
                      (proc5 newproc4))
                  (proc5 '()))))))))))

So you see, it's not the same proc. There are five of them, each potentially different, each residing in a different, nested environment frame.

You might ask, why nested environments? After all the get-pairs2 is tail-recursive, so it must not do that, can re-use its call frame for the next invocation.

That's true, but still it's an implementational detail related to the code's operational efficiency, which doesn't change its meaning (semantics). Semantically it is much easier to see what the code means, with the nested let re-writes.

Nevertheless that is a valid point and potential source of your confusion. I once was confused exactly by this point as well.

And that is why I wrote "(the copy of) the environment frame" there at the start of this post. Even if the tail-recursive call can -- maybe even must, under Scheme's TCO guarantee -- re-use its own call frame for the next invocation, the freshly created closure must hold on to its own copy, to not introduce the erroneous conflation of semantically different identifiers.

Indeed, that environment flattening and frame reuse can be described by the following timeful computation:

;; re-use the tail-recursive call frame {alist proc}
(let ((alist '(1 2 3 4))
      (proc  display)
      (num #f))
  (set! num  (car alist))                        ; 1
  (set! proc (let ((num num) (proc proc))        ; closure!
               (lambda (l) (proc l))))
  (set! alist (cdr alist))                       ; (2 3 4)
  (set! num  (car alist))                        ;  2
  (set! proc (let ((num num) (proc proc))        ; closure!
               (lambda (l) (proc (cons num l)))))
  (set! alist (cdr alist))                       ;   (3 4)
  (set! num  (car alist))                        ;    3
  (set! proc (let ((num num) (proc proc))        ; closure!
               (lambda (l) (proc l))))
  (set! alist (cdr alist))                       ;     (4)
  (set! num (car alist))                         ;      4
  (set! proc (let ((num num) (proc proc))        ; closure!
               (lambda (l) (proc (cons num l)))))
  (set! alist (cdr alist))                       ;      ()
  (proc '()))

Or as the definition which it might actually get compiled as,

(let ((alist '(1 2 3 4))
      (proc  display)
      (num #f))
  (let loop ()
    (set! num  (car alist))
    (set! proc (let ((num num) (proc proc))
                 (if (zero? (remainder num 2))
                     (lambda (l) (proc (cons num l)))
                     (lambda (l) (proc l)))))
    (set! alist (cdr alist)) 
    (if (null? alist)
       (proc '())
       (loop))))

So how many procs are there now? :)

_{(still five, for otherwise it wouldn't be working... i.e. there's one binding but five values were created during the loop's run, each encasing the previous one inside it (or, actually, holding a reference to it); and when the last proc value -- which is a function -- finally runs, it invokes the one "inside" it, and that one invokes the one "inside" it, and so on going back to the very first proc, the display which which we've started.)}