Search code examples
calgorithmcompiler-constructioncode-generationcompiler-theory

Code generation for expressions with fixed/preassigned register


I'm using this (see below) algorithm(take idea from this answer) to code generation from a tree. I'm targeting x86 arch, now I need to deal with mul/div instructions which uses registers eax/ebx as argument.

My question is:

How do I modify this to load operands of a certain instruction to load at fixed register? say, for mul instruction load left and right subtree on eax and ebx registers. My current implementation is: pass current node begin evaluated as argument and if it's MUL or DIV set reg to R0 or R1 according to tree's side, if it's LEFT or RIGHT respectively. If reg is in_use, push reg on stack and mark it as begin free(not implmented yet). The current implementation doesn't work because it does assert in assert(r1 != r2) in emit_load() function (meaning both registers passed as argument are equals like r1 = REG_R0 and r2 = REG_R0)

      void gen(AST *ast, RegSet in_use, AST *root) {
            if(ast->left != 0 && ast->right != 0) {
                Reg spill = NoRegister; /* no spill yet */
                AST *do1st, *do2nd;     /* what order to generate children */
                if (ast->left->n >= ast->right->n) {
                    do1st = ast->left;
                    do2nd = ast->right;
                } else {
                    do1st = ast->right;
                    do2nd = ast->left; }
                gen(do1st, in_use);
                in_use |= 1 << do1st->reg;
                if (all_used(in_use)) {
                    spill = pick_register_other_than(do1st->reg);
                    in_use &= ~(1 << spill);
                    emit_operation(PUSH, spill); 
                }
                gen(do2nd, in_use);
                ast->reg = ast->left->reg
                emit_operation(ast->type, ast->left->reg, ast->right->reg);
                if (spill != NoRegister)
                    emit_operation(POP, spill);
            } else if(ast.type == Type_id || ast.type == Type_number) {
                if(node->type == MUL || node->type == DIV) {
                    REG reg;
                    if(node_side == ASTSIDE_LEFT)  reg = REG_R0; 
                    if(node_side == ASTSIDE_RIGHT) reg = REG_R1;
                    if(is_reg_in_use(in_use, reg)) {
                        emit_operation(PUSH, reg);
                    }

                } else {
                  ast->reg = pick_unused_register(in_use);
                  emit_load(ast);
             }
            } else {
                print("gen() error");
                // error
            }
    }

// ershov numbers
void label(AST ast) {
    if(ast == null)
        return;

    label(ast.left);
    label(ast.right);

    if(ast.type == Type_id || ast.type == Type_number)
        ast.n = 1;
    // ast has two childrens
    else if(ast.left not null && ast.right not null) {      
        int l = ast.left.n;
        int r = ast.right.n;

        if(l == r)
            ast.n = 1 + l;
        else
            ast.n = max(1, l, r);
    }
    // ast has one child
    else if(ast.left not null && ast.right is null)
        ast.n = ast.left.n;
    else
        print("label() error!");
}

Solution

  • With a one-pass code generator like this, your options are limited. It's probably simpler to generate 3-address code or some other linear intermediate representation first and then worry about register targeting (this is the name for what you're trying to accomplish).

    Nonetheless, what you want to do is possible. The caveat is that you won't get very high quality code. To make it better, you'll have to throw away this generator and start over.

    The main problem you're experiencing is that Sethi-Ulman labeling is not a code generation algorithm. It's just a way of choosing the order of code generation. You're still missing important ideas.

    With all that out of the way, some points:

    Pushing and popping registers to save them temporarily makes life difficult. The reason is pretty obvious. You can only get access to the saved values in LIFO order.

    Things become easier if you allocate "places" that may be either registers or memory locations in the stack frame. The memory locations effectively extend the register file to make it as large as necessary. A slight complication is that you'll need to remember for each function how many words are required for places in that function's stack frame and backpatch the function preamble to allocate that number.

    Next, implement a global operand stack where each stack element is a PLACE. A PLACE is a descriptor telling where an operand that has been computed by already-emitted code is located: register or memory and how to access it. (For better code, you can also allow a PLACE to be a user variable and/or immediate value, but such PLACEs are never returned by the PLACE allocator described below. Additionally, the more kinds of PLACEs you allow, the more cases must be handled by the code emitter, also described below.)

    The general principle is "be lazy." The later we can wait to emit code, the more information will be available. With more information, it's possible to generate better code. The stack of PLACEs does a reasonably good job of accomplishing this.

    The code generator invariant is that it emits code that leaves the result PLACE at the top of the operand stack.

    You will also need a PLACE allocator. This keeps track of registers and the memory words in use. It allocates new memory words if all registers and current words are already busy.

    Registers in the PLACE allocator can have three possible statuses: FREE, BUSY, PINNED. A PINNED register is one needed to hold a value that can't be moved. (We'll use this for instructions with specific register requirements.) A BUSY register is one needed for a value that's okay to be moved to a different PLACE as required. A FREE register holds no value.

    Memory in the PLACE allocator is either FREE or BUSY.

    The PLACE allocator needs at least these entry points:

    1. allocate_register pick a FREE register R, make it BUSY, and return R. If no FREE registers are available, allocate a FREE memory word P, move a BUSY register R's contents there, and return R.
    2. pin_register(R) does as follows: If R is PINNED, raise a fatal error. If R is BUSY, get a FREE PLACE P (either register or memory word), emit code that moves the contents of R to P, mark R PINNED and return. If R is FREE, just mark it PINNED and return.

    Note that when pinning or allocating register R requires moving its contents, the allocator must update the corresponding element in the operand stack. What was R must be changed to P. For this purpose, the allocator maintains a map taking each register to the operand stack PLACE that describes it.

    With all this complete, the code generator for binary ops will be simple:

    gen_code_for(ast_node) {
      if (ast_node->left_first) {
        gen_code_for(ast_node->left_operand)
        gen_code_for(ast_node->right_operand)
      } else {
        gen_code_for(ast_node->right_operand)
        gen_code_for(ast_node->left_operand)
        swap_stack_top_2()  // get stack top 2 elements in correct order
      }
      emit_code_for(ast_node)
    }
    

    The code emitter will work like this:

    emit_code_for(ast_node) {
      switch (ast_node->kind) {
        case DIV:  // An operation that needs specific registers
          pin_register(EAX) // Might generate some code to make EAX available
          pin_register(EDX) // Might generate some code to make EDX available
          emit_instruction(XOR, EDX, EDX) // clear EDX
          emit_instruction(MOV, EAX, stack(1)) // lhs to EAX
          emit_instruction(DIV, stack(0)) // divide by rhs operand
          pop(2) // remove 2 elements and free their PLACES
          free_place(EDX) // EDX not needed any more.
          mark_busy(EAX)  // EAX now only busy, not pinned.
          push(EAX) // Push result on operand stack
          break;
        case ADD: // An operation that needs no specific register.
          PLACE result = emit_instruction(ADD, stack(1), stack(0))
          pop(2)
          push(result)
          break;
        ... and so on
      }
    }
    

    Finally, the instruction emitter must know what to do when its operands have combinations of types not supported by the processor instruction set. For example, it might have to load a memory PLACE into a register.

    emit_instruction(op, lhs, [optional] rhs) {
      switch (op) {
        case DIV:
          assert(RAX->state == PINNED && RDX->state == PINNED)
          print_instruction(DIV, lhs)
          return RAX;
        case ADD:
          if (lhs->kind == REGISTER) {
            print_instruction(ADD, lhs, rhs)
            return lhs
          }
          if (rhs->kind == REGISTER) {
            print_instruction(ADD, rhs, lhs)
            return rhs
          }
          // Both operands are MEMORY
          R = allocate_register // Get a register; might emit some code.
          print_instruction(MOV, R, lhs)
          print_instruction(ADD, R, rhs) 
          return R
          ... and so on ...
    

    I've necessarily let out many details. Ask what isn't clear.

    OP's Questions Addressed

    You're right that I intended stack(n) to be the PLACE that is n from the top of the operand stack.

    Leaves of the syntax tree just push a PLACE for a computed value on the operand stack to satisfy the invariant.

    As I said above, you can either create special kinds of PLACEs for these operands (user-labeled memory locations and/or immediate values), or - simpler and as you proposed - allocate a register and emit code that loads the value into that register, then push the register's PLACE onto the stack. The simpler method will result in unnecessary load instructions and consume more registers than needed. For example x = x + 1 will generate code something like:

    mov esi, [ebp + x]
    mov edi, 1
    add esi, edi
    mov [ebp + x], esi
    

    Here I'm using x to denote the base pointer offset of the variable.

    With PLACEs for variables and literals, you can easily get:

    mov esi, [ebp + x]
    add esi, 1
    mov [ebp + x], esi
    

    By making the code generator aware of the PLACE where the assignment needs to put its answer, you can get

    add [ebp + x], 1
    

    or equivalently

    inc [bp + x]
    

    Accomplish this by adding a parameter PLACE *target to the code generator that describes where the final value of the computed expression value needs to go. If you're not currently compiling an expression, this is set to NULL. Note that with target, the code generator invariant changes: The expression result's PLACE is at the top of the operand stack unless target is set. In that case, it's already been computed into the target's PLACE.

    How would this work on x = x + 1? The ASSIGNMENT case in the emit_code_for procedure would provide the target as the PLACE for x when it calls itself recursively to compile x + 1. This delegates responsibility downward for getting the computed value to the right memory location, which is x. The emit_code_for case for ADD ow calls emit_code_for recursively to evaluate the operands x and 1 onto the stack. Since we have PLACEs for user variables and immediate values, these are pushed on the stack while generating no code at all. The ADD emitter must now be smart enough to know that if it sees a memory location L and a literal constant C on the stack and the target is also L, then it can emit add L, C, and it's done.

    Remember that every time you make the code generator "smarter" by providing it with more information to make its decisions like this, it will become longer and more complicated because there are more cases to handle.