Search code examples
assemblycompilationobject-filessymbol-table

symbol table and relocation table in object file


From what I understand, instructions and data in an object file all have addresses. First data item start at address 0 and first instruction also start at address 0.

The relocation table contains information about instructions that need to be updated if the addresses in the file change, for example if the file is linked together with another. Line A, in the example below, would be in the relocation table. I don't think B would be in the relocation table, since the address of label "equal" is relative to B. Are these correct assumptions?

I know the symbol table show the labels the file have and also labels that haven't been resolved. But what other information does the symbol table contain?

Also, when the assembler translates the instructions to binary, what is placed in those instructions that have unresolved references?. B in this example.

.data
TEXT: .asciiz "Foo"

.text
.global main
main:
     li t0, 1
     beq t0, 1, equal #B

equal:
    la a0, TEXT
    jal printf #A

Solution

  • Yes, your assumptions are correct. There are various types of relocations, what the assembler emits into the instruction depends on the type. Generally it's an offset to be added. You can use objdump -dr to see relocations. For better illustration I have changed your code a little:

    .data
    .int 0
    TEXT: .asciiz "Foo"
    .text
    .global main
    main:
         li $t0, 1
         beq $t0, 1, equal #B
         bne $t0, 42, foo  #C
    
    equal:
         la $a0, TEXT
         jal printf #A
    

    Output of objdump:

    00000000 <main>:
       0:   24080001        li      t0,1
       4:   24010001        li      at,1
       8:   11010004        beq     t0,at,1c <equal>
       c:   00000000        nop
      10:   2401002a        li      at,42
      14:   1501ffff        bne     t0,at,14 <main+0x14>
                            14: R_MIPS_PC16 foo
      18:   00000000        nop
    
    0000001c <equal>:
      1c:   3c040000        lui     a0,0x0
                            1c: R_MIPS_HI16 .data
      20:   0c000000        jal     0 <main>
                            20: R_MIPS_26   printf
      24:   24840004        addiu   a0,a0,4
                            24: R_MIPS_LO16 .data
    

    As you said, there is no relocation for the beq since that's a relative address within this object file.

    The bne I added (line marked with C) references an external symbol, so even though the address is relative a relocation entry is needed. It will be of type R_MIPS_PC16 to produce a 16 bit signed word offset to symbol foo. As the instruction encoding requires offset from the next word and not the current PC that the relocation uses, 1 has to be subtracted, and that's encoded as 2's complement ffff into the instruction itself.

    The la pseudoinstruction has been translated by the assembler into a lui/addiu pair (the latter in the delay slot of the jal). For the lui a R_MIPS_HI16 relocation is created against the .data section which will fill in the top 16 bits. Since the symbol TEXT is at address 4 in the .data section, the top 16 bits of the offset are 0. This means the instruction contains 0 offset. Similarly, for the low 16 bits, except there the instruction contains an offset of 4.

    Finally, the jal printf is using yet another kind of relocation that is tailored for the encoding required by the instruction. The offset is zero because the jump is directly to the referenced symbol. Note that objdump is trying to be helpful by decoding that, but it doesn't process the relocation so the <main> it outputs is of course nonsense.