Search code examples
language-agnosticbisonintermediate-languagecompiler-construction

How does backpatching work with markers?


I've searched all over the Internet and could not find a proper explanation of how backpatching works?

Can you please explain me how does backpatching works? How does it work with the markers?

I know it has 2 main types of markers:

  1. with next-quad in it
  2. with next-list in it

I found this code, in which they are taking an input file and creating a file with RISKI language.

In their first roll they have:

PROGRAM : N FUNCTION M MAIN_FUNCTION

and you can see that N and M are markers (they are empty rolls).


Solution

  • One-pass code generation has a small problem with generating code for conditionals. A typical if statement:

    if CONDITION then ALTERNATIVE_1 else ALTERNATIVE_2
    

    needs to be compiled into something like this:

      compute CONDITION
      JUMP_IF_TRUE label1
      JUMP_IF_FALSE label2
    
    label1:
      code for ALTERNATIVE_1
      JUMP label3
    
    label2:
      code for ALTERNATIVE_2
      JUMP label3
    
    label3:
      next statement
    

    But when the code for CONDITION is being generated, it is not known where label1 and label2 are, and when the code for ALTERNATIVE_1 and ALTERNATIVE_2 is being generated, it is not known where label3 is.

    One way to do this is to use symbolic names for the the labels, as in the above pseudocode, and fill in the actual values when they are known. That requires storing a symbolic name in the jump statement, which complicates the datastructures (in particular, you can't just use a binary assembler code). It also requires a second pass, just to fill in jump targets.

    A (possibly) simpler way is to just remember the address of the jump statements, and patch in the target address when it is known. This is known as "backpatching", because you go back and patch the generated code.

    It turns out that in many cases, you end up with multiple branches to the same label. A typical case is "short-circuit" booleans like the C-family's && and || operators. For example, extending the original example:

    if (CONDITION_1 and CONDITION_2) or CONDITION_3 then ALTERNATIVE_1 else ALTERNATIVE_2
    
      compute CONDITION_1
      JUMP_IF_TRUE label1
      JUMP_IF_FALSE label2
    
    label1:
      compute CONDITION_2
      JUMP_IF_TRUE label3
      JUMP_IF_FALSE label2
    
    label2:
      compute CONDITION_3
      JUMP_IF_TRUE label3
      JUMP_IF_FALSE label4
    
    label3:
      code for ALTERNATIVE_1
      JUMP label5
    
    label4:
      code for ALTERNATIVE_2
      JUMP label5
    
    label5:
      next statement
    

    It turns out that for simple languages, it is only necessary to remember two incomplete jump statements (often called "true" and "false"). Because there might be multiple jumps to the same target, each of these is actually a linked list of incomplete jump statements, where the target address is used to point to the next jump in the list. So the backpatching walks back through the list, patching in the correct target and using the original target to find the previous statement which needs to be patched.

    What you call markers (which are an instance of what yacc/bison refers to as "Mid-Rule Productions") are not really related to backpatching. They can be used for many purposes. In one-pass code-generation, it is often necessary to perform some action in the middle of a rule, and backpatching is just one example.

    For example, in the hypothetical if statement, it would be necessary to initialize the backpatch lists before the CONDITION is parsed and then backpatch at the beginning of the THEN and ELSE clauses. (Another backpatch will be triggered at the end of the parse of the entire if statement, but that one will be in the rule's finnal action.)

    The easiest way to perform actions in the middle of a rule is to insert a mid-rule action, which is equivalent to inserting an empty "marker" production with an action, as in the example bison file you point to.