Search code examples
unixccpdp-11

Meaning of unix v6 assembly code generated by cc compiler?


For example, below is a piece of C code and its assembly code generated by cc compiler.

// C code (pre K&R C)    
foo(a, b) {
    int c, d;
    c = a;
    d = b;
    return c+d;
}
// corresponding assembly code generated by cc
.global _foo
.text
_foo:
~~foo:
~a=4
~b=6
~c=177770
~d=177766
jsr r5, csv
sub $4, sp
mov 4(r5), -10(r5)
mov 6(r5), -12(r5)
mov -10(r5), r0
add -12(r5), r0
jbr L1
L1: jmp cret

I can understand most of the code. But I don't know what does ~~foo: do. And where do the magic numbers come from in ~c=177770 and ~d=177766. The hardware is pdp-11/40.


Solution

  • The tildes look like data which determines the stack usage. You might find it helpful to recall that the pdp-11 used 16-bit integers, and that DEC preferred octal numbers over hexadecimal.

    That

    jsr r5, csv
    

    is a way of making register 5 (r5) point to some data (perhaps the list of offsets).

    The numbers correspond to offsets on the stack in octal. The caller is assumed to do something like

    • push a and b onto the stack (positive offsets)
    • push the return address onto the stack (offset=0)
    • possibly push other stuff in the csv function
    • c and d are local variables (negative offsets, hence the "17777x")

    That line

    ~d=177776
    

    looks odd - I'd expect

    ~d=177766
    

    since it should be below c on the stack. The -10 and -12 offsets in the register operands look like they're also octal numbers. You should be able to match up the offsets with the variables, by context.

    That's just an educated guess: I adapted the jsr+r5 idiom a while back in a text-editor.

    The lines with tildes are symbol definitions. A clue for that is in the DECUS C Compiler Reference, found at

    ftp://ftp.update.uu.se/pub/pdp11/rsx/lang/decusc/2.19/005003/CC.DOC
    

    which says

      3.3  Global Symbols Containing Radix-50 '$' and '.' 
             ______ _______ __________ ________     ___
    
        With  this  version  of  Decus C, it is possible to generate and
        access global symbols which contain the Radix-50  '.'  and  '$'.
        The  compiler allows identifiers to contain the Ascii '$', which
        becomes a Radix-50 '$' in the object code.  The AS assembly code
        shows  this  character as a tilde (~).  The underscore character
        (_) in a C program  becomes  a  '.'  in  both  the  AS  assembly
        language  and  in  the  object  code.  This allows C programs to
        access all global symbols:  
    
                extern int $dsw;  
                .  .  .  
                printf("Directive status = %06o\n", $dsw);  
    
        The  above  prints  the current contents of the task's directive
        status word.
    

    So you could read

    ~a=4
    

    as

    $a=4
    

    and see that $a is a (more or less) conventional symbol.