I'm using this (see below) algorithm(take idea from this answer) to code generation from a tree. I'm targeting x86 arch, now I need to deal with mul/div instructions which uses registers eax/ebx as argument.
My question is:
How do I modify this to load operands of a certain instruction to load at fixed register? say, for mul
instruction load left and right subtree on eax
and ebx
registers. My current implementation is: pass current node begin evaluated as argument and if it's MUL
or DIV
set reg
to R0
or R1
according to tree's side, if it's LEFT
or RIGHT
respectively. If reg
is in_use
, push reg
on stack and mark it as begin free(not implmented yet). The current implementation doesn't work because it does assert in assert(r1 != r2)
in emit_load()
function (meaning both registers passed as argument are equals like r1 = REG_R0
and r2 = REG_R0
)
void gen(AST *ast, RegSet in_use, AST *root) {
if(ast->left != 0 && ast->right != 0) {
Reg spill = NoRegister; /* no spill yet */
AST *do1st, *do2nd; /* what order to generate children */
if (ast->left->n >= ast->right->n) {
do1st = ast->left;
do2nd = ast->right;
} else {
do1st = ast->right;
do2nd = ast->left; }
gen(do1st, in_use);
in_use |= 1 << do1st->reg;
if (all_used(in_use)) {
spill = pick_register_other_than(do1st->reg);
in_use &= ~(1 << spill);
emit_operation(PUSH, spill);
}
gen(do2nd, in_use);
ast->reg = ast->left->reg
emit_operation(ast->type, ast->left->reg, ast->right->reg);
if (spill != NoRegister)
emit_operation(POP, spill);
} else if(ast.type == Type_id || ast.type == Type_number) {
if(node->type == MUL || node->type == DIV) {
REG reg;
if(node_side == ASTSIDE_LEFT) reg = REG_R0;
if(node_side == ASTSIDE_RIGHT) reg = REG_R1;
if(is_reg_in_use(in_use, reg)) {
emit_operation(PUSH, reg);
}
} else {
ast->reg = pick_unused_register(in_use);
emit_load(ast);
}
} else {
print("gen() error");
// error
}
}
// ershov numbers
void label(AST ast) {
if(ast == null)
return;
label(ast.left);
label(ast.right);
if(ast.type == Type_id || ast.type == Type_number)
ast.n = 1;
// ast has two childrens
else if(ast.left not null && ast.right not null) {
int l = ast.left.n;
int r = ast.right.n;
if(l == r)
ast.n = 1 + l;
else
ast.n = max(1, l, r);
}
// ast has one child
else if(ast.left not null && ast.right is null)
ast.n = ast.left.n;
else
print("label() error!");
}
With a one-pass code generator like this, your options are limited. It's probably simpler to generate 3-address code or some other linear intermediate representation first and then worry about register targeting (this is the name for what you're trying to accomplish).
Nonetheless, what you want to do is possible. The caveat is that you won't get very high quality code. To make it better, you'll have to throw away this generator and start over.
The main problem you're experiencing is that Sethi-Ulman labeling is not a code generation algorithm. It's just a way of choosing the order of code generation. You're still missing important ideas.
With all that out of the way, some points:
Pushing and popping registers to save them temporarily makes life difficult. The reason is pretty obvious. You can only get access to the saved values in LIFO order.
Things become easier if you allocate "places" that may be either registers or memory locations in the stack frame. The memory locations effectively extend the register file to make it as large as necessary. A slight complication is that you'll need to remember for each function how many words are required for places in that function's stack frame and backpatch the function preamble to allocate that number.
Next, implement a global operand stack where each stack element is a PLACE. A PLACE is a descriptor telling where an operand that has been computed by already-emitted code is located: register or memory and how to access it. (For better code, you can also allow a PLACE to be a user variable and/or immediate value, but such PLACEs are never returned by the PLACE allocator described below. Additionally, the more kinds of PLACEs you allow, the more cases must be handled by the code emitter, also described below.)
The general principle is "be lazy." The later we can wait to emit code, the more information will be available. With more information, it's possible to generate better code. The stack of PLACEs does a reasonably good job of accomplishing this.
The code generator invariant is that it emits code that leaves the result PLACE at the top of the operand stack.
You will also need a PLACE allocator. This keeps track of registers and the memory words in use. It allocates new memory words if all registers and current words are already busy.
Registers in the PLACE allocator can have three possible statuses: FREE, BUSY, PINNED. A PINNED register is one needed to hold a value that can't be moved. (We'll use this for instructions with specific register requirements.) A BUSY register is one needed for a value that's okay to be moved to a different PLACE as required. A FREE register holds no value.
Memory in the PLACE allocator is either FREE or BUSY.
The PLACE allocator needs at least these entry points:
allocate_register
pick a FREE register R, make it BUSY, and return R. If no FREE registers are available, allocate a FREE memory word P, move a BUSY register R's contents there, and return R.pin_register(R)
does as follows: If R is PINNED, raise a fatal error. If R is BUSY, get a FREE PLACE P (either register or memory word), emit code that moves the contents of R to P, mark R PINNED and return. If R is FREE, just mark it PINNED and return.Note that when pinning or allocating register R requires moving its contents, the allocator must update the corresponding element in the operand stack. What was R must be changed to P. For this purpose, the allocator maintains a map taking each register to the operand stack PLACE that describes it.
With all this complete, the code generator for binary ops will be simple:
gen_code_for(ast_node) {
if (ast_node->left_first) {
gen_code_for(ast_node->left_operand)
gen_code_for(ast_node->right_operand)
} else {
gen_code_for(ast_node->right_operand)
gen_code_for(ast_node->left_operand)
swap_stack_top_2() // get stack top 2 elements in correct order
}
emit_code_for(ast_node)
}
The code emitter will work like this:
emit_code_for(ast_node) {
switch (ast_node->kind) {
case DIV: // An operation that needs specific registers
pin_register(EAX) // Might generate some code to make EAX available
pin_register(EDX) // Might generate some code to make EDX available
emit_instruction(XOR, EDX, EDX) // clear EDX
emit_instruction(MOV, EAX, stack(1)) // lhs to EAX
emit_instruction(DIV, stack(0)) // divide by rhs operand
pop(2) // remove 2 elements and free their PLACES
free_place(EDX) // EDX not needed any more.
mark_busy(EAX) // EAX now only busy, not pinned.
push(EAX) // Push result on operand stack
break;
case ADD: // An operation that needs no specific register.
PLACE result = emit_instruction(ADD, stack(1), stack(0))
pop(2)
push(result)
break;
... and so on
}
}
Finally, the instruction emitter must know what to do when its operands have combinations of types not supported by the processor instruction set. For example, it might have to load a memory PLACE into a register.
emit_instruction(op, lhs, [optional] rhs) {
switch (op) {
case DIV:
assert(RAX->state == PINNED && RDX->state == PINNED)
print_instruction(DIV, lhs)
return RAX;
case ADD:
if (lhs->kind == REGISTER) {
print_instruction(ADD, lhs, rhs)
return lhs
}
if (rhs->kind == REGISTER) {
print_instruction(ADD, rhs, lhs)
return rhs
}
// Both operands are MEMORY
R = allocate_register // Get a register; might emit some code.
print_instruction(MOV, R, lhs)
print_instruction(ADD, R, rhs)
return R
... and so on ...
I've necessarily let out many details. Ask what isn't clear.
OP's Questions Addressed
You're right that I intended stack(n)
to be the PLACE that is n
from the top of the operand stack.
Leaves of the syntax tree just push a PLACE for a computed value on the operand stack to satisfy the invariant.
As I said above, you can either create special kinds of PLACEs for these operands (user-labeled memory locations and/or immediate values), or - simpler and as you proposed - allocate a register and emit code that loads the value into that register, then push the register's PLACE onto the stack. The simpler method will result in unnecessary load instructions and consume more registers than needed. For example x = x + 1
will generate code something like:
mov esi, [ebp + x]
mov edi, 1
add esi, edi
mov [ebp + x], esi
Here I'm using x
to denote the base pointer offset of the variable.
With PLACEs for variables and literals, you can easily get:
mov esi, [ebp + x]
add esi, 1
mov [ebp + x], esi
By making the code generator aware of the PLACE where the assignment needs to put its answer, you can get
add [ebp + x], 1
or equivalently
inc [bp + x]
Accomplish this by adding a parameter PLACE *target
to the code generator that describes where the final value of the computed expression value needs to go. If you're not currently compiling an expression, this is set to NULL. Note that with target
, the code generator invariant changes: The expression result's PLACE is at the top of the operand stack unless target
is set. In that case, it's already been computed into the target's PLACE.
How would this work on x = x + 1
? The ASSIGNMENT
case in the emit_code_for
procedure would provide the target
as the PLACE for x
when it calls itself recursively to compile x + 1
. This delegates responsibility downward for getting the computed value to the right memory location, which is x
. The emit_code_for
case for ADD
ow calls emit_code_for
recursively to evaluate the operands x
and 1
onto the stack. Since we have PLACEs for user variables and immediate values, these are pushed on the stack while generating no code at all. The ADD
emitter must now be smart enough to know that if it sees a memory location L and a literal constant C on the stack and the target is also L, then it can emit add L, C
, and it's done.
Remember that every time you make the code generator "smarter" by providing it with more information to make its decisions like this, it will become longer and more complicated because there are more cases to handle.