Creating a Closure in a Scheme-like Compiler

I'm implementing a scheme-like lisp, that will at some point be compiled into some form of bytecode, which can be seen here. Unfortunately, I have coded myself into a bit of a hole, and am not sure how to get out of it. Essentially, given a lambda that looks like this (the outer b is purposely unused):

(lambda (a b) (lambda (c) (+ a c)))

My code yields this syntax tree:

[
  type: :lambda,
  args: [[type: :word, val: 'a'], [type: :word, val: 'b']],
  body: [
    [
      type: :lambda,
      args: [[type: :word, val: 'c']],
      body: [
        [
          type: :expr,
          val: [
            [type: :word, val: '+'],
            [type: :word, val: 'a'],
            [type: :word, val: 'c']
          ]
        ]
      ]
    ]
  ]
]

Unfortunately, when I actually get to generating bytecode, it won't be easy to create the closures necessary for these lambdas (as far as I can tell). Ideally, I'd like to generate a tree that looks like this:

[
  type: :lambda,
  args: [[type: :word, val: 'a'], [type: :word, val: 'b']],
  closure: [],
  body: [
    [
      type: :lambda,
      args: [[type: :word, val: 'c']],
      closure: [[type: :word, val: 'a']],
      body: [
        [
          type: :expr,
          val: [
            [type: :word, val: '+'],
            [type: :word, val: 'a'],
            [type: :word, val: 'c']
          ]
        ]
      ]
    ]
  ]
]

It is easy enough to tell if a given parameter should be part of the closure by seeing if it shows up in the env, however, since i just call Enum.map over the body I'm not sure how to get that information back to my lambda object. I don't need specific code for how to fix this, but a general guideline/hint/push in the right direction would be great (I know it's a bit vague, but I wasn't quite sure how to make more a specific test case for this).

Solution

You can walk down your AST constructing a list of bound identifiers at each node.

E.g., a lambda node binds its arguments (feel free to rewrite those names if they're already in your bound list), as well as let and let*. You also build a list of referenced free identifiers for each AST node while walking back up this tree.

lambda, let and let* remove identifiers from those free variable lists.

The rest is easy: at every lambda node you calculate the intersection between referenced and bound lists, and the result will be the environment this closure must capture. If it's empty, this is a simple function with no environment.

In your example, it'll be:

[b:() f:()](lambda (a b) [b:(a b) f:(a)] (lambda (c) [b: (a b c) f: (a c)] (+ a c)))

As you can see, the inner lambda have a in common between its b: and f: lists, so you must emit a closure allocation instruction here, constructing an environment of one element, a.

You can combine this pass with variable renaming (e.g., turning lambda argument names into argument numbers).