Working on a compiler, I am facing a problem related to named argument lists and evaluation order. Basically, the language ensures that for the following function and function call:
int call(int first, int second) = first - second
int sideEffect(int i) { println(i); return i }
println(call(second: sideEffect(0), first: sideEffect(1)))
The output will be
0
1
1
although the arguments are given in reverse order. I am working on the JVM, so they must be sorted on the bytecode level to match the signature of call
. In stack-based pseudo-bytecode:
push 0
call sideEffect(int)
push 1
call sideEffect(int)
swap
call call(int, int)
As you can see, a swap
bytecode instruction is required here to make sure they are both evaluated and passed in the right order. This can be drawn as a graph:
Actual: second first
\ /
swap
/ \
Formal: first second
There is only one swap
instruction that can swap the top two elements on the stack, so for more elements the compiler needs to generate local variables.
Actual: third second first
| | /¯¯¯¯
local0 swap local0
____/ | |
Formal: first second third
In bytecode:
third...
store local0
second...
first...
swap
load local0
call ...
Of course this can be expanded to arbitrary amounts of actual and formal parameters.
What I am now looking for is some kind of algorithm that determines if and where I should insert these swap
or local variable instructions. A reference to a compiler implementation would also be helpful.
Note that it is not part of my problem which actual argument belongs to which formal parameter. That is already solved by my compiler. To simplify, the problem could be looked at like this:
Given two arrays of the same sizes containing the same elements in different order, which elements can be swapped or shifted from the first array to get the order of the second one.
With registers only, there's basically only one solution. Iterate through the arguments in desired stack order. If the current argument is already computed, load it from a local. Otherwise, compute the uncomputed arguments before it in execution order, storing them in registers, and then compute the current argument.
The presence of a swap
operation means that we can use the top of the stack as a de facto local variable. Python code below.
import heapq
import itertools
class Allocator:
def __init__(self):
self.free_heap = []
self.next_free = 0
def allocate(self):
if self.free_heap:
return heapq.heappop(self.free_heap)
i = self.next_free
self.next_free += 1
return i
def free(self, i):
heapq.heappush(self.free_heap, i)
def is_allocated(self, i):
return i < self.next_free and i not in self.free_heap
def loads_and_stores(perm):
perm = list(perm)
n = len(perm)
assert set(perm) == set(range(n))
location = {}
allocator = Allocator()
i = iter(perm)
# "location 0" is the top of the stack
for j in range(n):
if j in location:
if location[j] != 0:
# not top of stack; need an actual load
print 'load', location[j]
if allocator.is_allocated(0):
# need to maintain the top of the stack
print 'swap'
allocator.free(location[j])
else:
while True:
k = next(i)
print 'compute', k
if k == j:
if allocator.is_allocated(0):
# need to maintain the top of the stack
print 'swap'
break
location[k] = allocator.allocate()
if location[k] != 0:
# not top of stack; need an actual store
print 'store', location[k]
return perm
def test(n):
for perm in itertools.permutations(range(n)):
print perm
loads_and_stores(perm)
test(4)
There are some interesting optimizations to do here, because allocate
can return any free local. An optimal algorithm would consider the execution cost of making each allocation.