I got into a discussion the other day about some nitty-gritty details of programming languages, and one topic that was brought up was what a function (or closure/lambda/etc.) actually 'is' from a data structures perspective. To be clear, I am not asking what functions do or how they are used, but rather how they are represented in code 'behind the scenes'.
When a function is 'called', what is happening as the code executes? Is a function a struct or object of some kind that is passed around using an identifier? Is a function even represented by a data structure at all? Does it vary significantly from language to language, or are functions generally represented the same way in the compiled/interpreted code? I had always assumed that functions were just sets of operations that affect data and that they don't exist in any sort of struct/object form, but then I started to wonder because there are many cases where functions are treated as variables/parameters and can be used in a manner very similar to structs or objects, for instance with closures.
Example in Swift:
let closure = {(firstInt: Int, secondInt: Int) -> Bool in
return firstInt > secondInt
}
How does this sort of behavior operate (in Swift or any language that supports closures)? What is being stored in the 'closure' constant there? When I call .map(function, iterable) in Python, what is being 'sent' to that call to represent the function? I hope that this question doesn't seem overly broad, but I thought it would be interesting and possibly helpful to know how functions are built and operated on. Most people I've encountered seem to focus only on the details of how variables/structs/objects/etc. operate without paying much attention (or at least not as much) to the inner workings of functions.
I'll answer from a JavaScript perspective, but I think this should provide a pretty general case for imperative languages with first-class functions.
In JavaScript, a function is just an object, which can be given properties like any other object:
var someObj = {};
someObj.someProperty = "hey, I have a property now";
var someFunc = function(a, b) { return a > b; };
someFunc.someProperty = "hey, I have a property now, too";
Thus, anything you can do to an object, you can do to a function. However, function objects have an internal property (a notional property accessible to the JavaScript execution engine, but not accessible to JavaScript code) called [[Code]]
, and another internal property called [[FormalParameters]]
. The spec defines these as:
[[Code]]
- The ECMAScript code of a function.
[[FormalParameters]]
- A possibly empty List containing the identifier Strings of a Function’s FormalParameterList.
So, if your language doesn't support closures, that's really all you need: an list of parameter names and some code to execute (here, "code" is whatever your language normally uses to express runnable code outside of functions: compiled binary, interpretable commands, or intermediate bytecode). It's easy enough to treat these properties in roughly the same way your language already treats normal properties. Whenever you call someFunc(val1, val2)
, you run the contents of the [[Code]]
property after supplying values for the a
and b
formal parameters.
If your language requires typed return values, that's probably a third property, checked at the point when you try to return
something. Also, your language may require typed formal parameters, in wihch case consider the formal parameter list a list of {string, type}
tuples.
Things get much worse when you're dealing with closures. To review, a closure is
In JavaScript, functions can see all variables defined in the functions that contain them:
function foo() {
var qux = 5;
function bar() { return qux; }
return bar;
}
var returnedBar = foo();
returnedBar();
Here, bar
is inside of foo
, so when bar
runs, it can look "up" the scope chain to see the qux
variable from foo
.
This needs an elaborate network of data structures to work properly. In my example, bar
can access the value of foo
's variables. Specifically, bar
can access the variables defined inside of foo
for the given invocation of foo
that caused that particular definition of bar
to become defined. Things can get hairy very quickly if we invoke foo
multiple times:
function foo(quxVal) {
var qux = quxVal;
function bar() { return qux; }
return bar;
}
var returnedBar1 = foo(1);
returnedBar1();
var returnedBar2 = foo(2);
returnedBar2();
Here, returnedBar1
and returnedBar2
have the same functional code, but their lexical environments are different, because they have different "parent" foo
invocations with differing environment records.
In JavaScript, scope is function based. An environment record in JavaScript keeps track of what variables each scope contains, and it maps identifier names to those variables. An environment record table might look like this:
Name Variable
---- --------
foo [value of this var is whatever is in address 0x34F6A2 right now]
bar [value of this var is whatever is in address 0x34F6A6 right now]
A lexical environment is a structure that holds an environment record. Each lexical environment is part of a scope chain, represented as a linked-list:
A Lexical Environment consists of an Environment Record and a possibly null reference to an outer Lexical Environment.
This is how we can look "up" into "parent" functions, as we do when bar
looks at foo
for its value of qux
. We traverse the scope chain, which is a linked list of lexical environment structures, directed upward into higher scopes. Each lexical environment has an environment record (mapping of variable identifier names to variables) and points to its parent lexical environment.
When we call foo(1)
above, it creates a new lexical environment record as part of running the function. That foo(1)
lexical environment has an environment record, which contains a record for the qux
variable inside foo
. Then, when we call bar
, that creates another lexical environment, which points to the parent foo(1)
lexical environment. (The bar
function has a [[Scope]]
internal property, which is used at function-invocation time to tell the new lexical environment what its parent is.) When we call returnedBar1
, the engine looks for the qux
variable in the bar
environment record first; then, when that fails, the engine looks at the parent lexical environment to find the qux
declared in foo
.
When we call foo(2)
, we create a totally new lexical environment, and then we call bar
to create a new lexical environment that is a child of the foo(2)
lexical environment. When we call returnedBar2
, we propagate up a totally different chain to the foo(2)
.
In practical data structure terms, each function has a [[Scope]]
property, set at definition time, that indicates which lexical environment that function should use as a parent for its own lexical environment whenever the function runs. An environment record (the "content" of a lexical environment) can be represented as a mapping of variable names to variables (where a "variable" is a structure with a pointer to whatever value the variable currently represents).
To conclude, here's the practical properties a closure needs:
A lexical environment consist of two parts:
qux
" -- where can we set/get its value in memory?)