Search code examples
functionlambdaclosurescomputer-sciencelow-level

What Are Functions/Closures/Lambdas, From A Data Structures Perspective?


I got into a discussion the other day about some nitty-gritty details of programming languages, and one topic that was brought up was what a function (or closure/lambda/etc.) actually 'is' from a data structures perspective. To be clear, I am not asking what functions do or how they are used, but rather how they are represented in code 'behind the scenes'.

When a function is 'called', what is happening as the code executes? Is a function a struct or object of some kind that is passed around using an identifier? Is a function even represented by a data structure at all? Does it vary significantly from language to language, or are functions generally represented the same way in the compiled/interpreted code? I had always assumed that functions were just sets of operations that affect data and that they don't exist in any sort of struct/object form, but then I started to wonder because there are many cases where functions are treated as variables/parameters and can be used in a manner very similar to structs or objects, for instance with closures.

Example in Swift:

let closure = {(firstInt: Int, secondInt: Int) -> Bool in
    return firstInt > secondInt
}

How does this sort of behavior operate (in Swift or any language that supports closures)? What is being stored in the 'closure' constant there? When I call .map(function, iterable) in Python, what is being 'sent' to that call to represent the function? I hope that this question doesn't seem overly broad, but I thought it would be interesting and possibly helpful to know how functions are built and operated on. Most people I've encountered seem to focus only on the details of how variables/structs/objects/etc. operate without paying much attention (or at least not as much) to the inner workings of functions.


Solution

  • I'll answer from a JavaScript perspective, but I think this should provide a pretty general case for imperative languages with first-class functions.

    Function values

    In JavaScript, a function is just an object, which can be given properties like any other object:

    var someObj = {};
    someObj.someProperty = "hey, I have a property now";
    
    var someFunc = function(a, b) { return a > b; };
    someFunc.someProperty = "hey, I have a property now, too";
    

    Thus, anything you can do to an object, you can do to a function. However, function objects have an internal property (a notional property accessible to the JavaScript execution engine, but not accessible to JavaScript code) called [[Code]], and another internal property called [[FormalParameters]]. The spec defines these as:

    [[Code]] - The ECMAScript code of a function.

    [[FormalParameters]] - A possibly empty List containing the identifier Strings of a Function’s FormalParameterList.

    So, if your language doesn't support closures, that's really all you need: an list of parameter names and some code to execute (here, "code" is whatever your language normally uses to express runnable code outside of functions: compiled binary, interpretable commands, or intermediate bytecode). It's easy enough to treat these properties in roughly the same way your language already treats normal properties. Whenever you call someFunc(val1, val2), you run the contents of the [[Code]] property after supplying values for the a and b formal parameters.

    If your language requires typed return values, that's probably a third property, checked at the point when you try to return something. Also, your language may require typed formal parameters, in wihch case consider the formal parameter list a list of {string, type} tuples.

    Closures

    Things get much worse when you're dealing with closures. To review, a closure is

    1. functional code (plus parameters and a return type), and
    2. free variables (or non-local variables), a hierarchical set of variables accessible to the functional code, coupled with the identifier names that map onto those variables

    In JavaScript, functions can see all variables defined in the functions that contain them:

    function foo() {
        var qux = 5;
        function bar() { return qux; }
        return bar;
    }
    
    var returnedBar = foo();
    returnedBar();
    

    Here, bar is inside of foo, so when bar runs, it can look "up" the scope chain to see the qux variable from foo.

    This needs an elaborate network of data structures to work properly. In my example, bar can access the value of foo's variables. Specifically, bar can access the variables defined inside of foo for the given invocation of foo that caused that particular definition of bar to become defined. Things can get hairy very quickly if we invoke foo multiple times:

    function foo(quxVal) {
        var qux = quxVal;
        function bar() { return qux; }
        return bar;
    }
    
    var returnedBar1 = foo(1);
    returnedBar1();
    
    var returnedBar2 = foo(2);
    returnedBar2();
    

    Here, returnedBar1 and returnedBar2 have the same functional code, but their lexical environments are different, because they have different "parent" foo invocations with differing environment records.

    Closure Data Structures

    In JavaScript, scope is function based. An environment record in JavaScript keeps track of what variables each scope contains, and it maps identifier names to those variables. An environment record table might look like this:

    Name    Variable
    ----    --------
    foo     [value of this var is whatever is in address 0x34F6A2 right now]
    bar     [value of this var is whatever is in address 0x34F6A6 right now]
    

    A lexical environment is a structure that holds an environment record. Each lexical environment is part of a scope chain, represented as a linked-list:

    A Lexical Environment consists of an Environment Record and a possibly null reference to an outer Lexical Environment.

    This is how we can look "up" into "parent" functions, as we do when bar looks at foo for its value of qux. We traverse the scope chain, which is a linked list of lexical environment structures, directed upward into higher scopes. Each lexical environment has an environment record (mapping of variable identifier names to variables) and points to its parent lexical environment.

    When we call foo(1) above, it creates a new lexical environment record as part of running the function. That foo(1) lexical environment has an environment record, which contains a record for the qux variable inside foo. Then, when we call bar, that creates another lexical environment, which points to the parent foo(1) lexical environment. (The bar function has a [[Scope]] internal property, which is used at function-invocation time to tell the new lexical environment what its parent is.) When we call returnedBar1, the engine looks for the qux variable in the bar environment record first; then, when that fails, the engine looks at the parent lexical environment to find the qux declared in foo.

    When we call foo(2), we create a totally new lexical environment, and then we call bar to create a new lexical environment that is a child of the foo(2) lexical environment. When we call returnedBar2, we propagate up a totally different chain to the foo(2).

    In practical data structure terms, each function has a [[Scope]] property, set at definition time, that indicates which lexical environment that function should use as a parent for its own lexical environment whenever the function runs. An environment record (the "content" of a lexical environment) can be represented as a mapping of variable names to variables (where a "variable" is a structure with a pointer to whatever value the variable currently represents).

    Conclusion

    To conclude, here's the practical properties a closure needs:

    • Code (some kind of runnable code) - used to run the function
    • Parameter list (list of string names and, optionally, types) - used to identify a function's arguments when it is called
    • Return type (type) - used to validate the function's return value
    • Lexical scope object (lexical environment) - when the function is called, the lexical environment created during the function invocation uses this lexical environment object as its parent

    A lexical environment consist of two parts:

    • Environment record (mapping of variable names to variables) - used to find the variable associated with a given identifier name (e.g., we want a variable named "qux" -- where can we set/get its value in memory?)
    • Parent lexical environment (lexical environment object) - A parent lexical environment, used as part of a linked list, to allow variable look-ups up the scope chain