assembly disassembly gnu-assembler addressing-mode 68000

Motorola 68000 assembler syntax for Program Counter Indirect with Index

I've been putting together my own disassembler for Sega Mega Drive ROMs, basing my initial work on the MOTOROLA M68000 FAMILY Programmer’s Reference Manual. Having disassembled a considerable chunk of the ROM, I've attempted to reassemble this disassembled output, using VASM as it can accept the Motorola assembly syntax, using its mot syntax module.

Now, for the vast majority of the reassembly, this has worked well, however there is one wrinkle with operations that have effective addresses defined by the "Program Counter Indirect with Index (8-Bit Displacement) Mode". Given that I'm only now learning Motorola 68000 assembly, I wanted to confirm my understanding and to ask: what is the proper syntax for these operations?

Interpretation

For example, if I have two words:

4ebb 0004

I've interpreted this as a JSR with the target destination being the sum of:

the contents of pc
0x04
the contents of d0

(Given that I am restricting myself to the 68000, I've elided any consideration of size and scale in the extension word). Based on how this addressing mode is described in the reference manual, I've emitted this as:

jsr ($04,pc,d0)

Assembling with VASM

However, when I feed this back into VASM it will emit the following error:

error 2030 in line X of "XXXX.asm": displacement out of range
>  jsr ($04,pc,d0)

which seems a very strange error to emit, given that the displacement can't be known until runtime, due to the use of the d0 register. Playing around with this, it appears to use the first part of the operand ($04) as the absolute target destination, and calculates a different displacement based on that.

Assembling with GNU `as`

If I switch to GNU as, the syntax that provides identical output to the original ROM is:

jsr %pc@(0x04,%d0:w)

which appears to indicate that the first part of the operand is the displacement. However, when I disassemble this using objdump, the listed instruction is:

jsr %pc@(0x6,%d0:w)

which seems to indicate that, in the MIT syntax that as uses, the first part of the operand is once again the absolute address.

Ultimate question

This confusion between the two syntaxes and even between the as assembly and subsequent disassembly makes me wonder what the correct syntax should be, or if perhaps instructions using this addressing mode tend to be generated by the assembler as part of macros or other higher level constructs.

Summary of findings

Thinking about the points @tofro has put me in the correct direction, and this is what I've arrived at:

Using a label

Both of the assemblers I've tested (VASM and GNU as) will properly handle a label provided in what I had considered the "displacement" part of the operand, and will calculate the displacement based on the current PC and the destination label. Given the convenience of this from the programmers point of view, and @tofro's observations, I'd say this is the way this kind of addressing is intended to be used.

So, assembling the following with vasm:

  org   $80

  jsr   (label,pc,d0)
  nop

label:
  nop

produces a listing file like so:

Sections:
00: "seg80" (80-88)


Source: "vasm-label.asm"
                                     1:   org   $80
                                     2: 
00:00000080 4EBB0004                 3:   jsr   (label,pc,d0)
00:00000084 4E71                     4:   nop
                                     5: 
                                     6: label:
00:00000086 4E71                     7:   nop
                                     8: 


Symbols by name:
label                            A:00000086

Symbols by value:
00000086 label

and assembling the following with as:

  .org  0x80

  jsr   %pc@(label,%d0:w)
  nop

label:
  nop

produces a listing file like so:

68K GAS  as-label.asm           page 1


   1 0000 0000 0000       .org  0x80
   1      0000 0000 
   1      0000 0000 
   1      0000 0000 
   1      0000 0000 
   2                
   3 0080 4EBB 0004       jsr   %pc@(label,%d0:w)
   4 0084 4E71            nop
   5                
   6                label:
   7 0086 4E71            nop
68K GAS  as-label.asm           page 2


DEFINED SYMBOLS
        as-label.asm:6      .text:0000000000000086 label

NO UNDEFINED SYMBOLS

We can see that both assemblers output the same two words for the instruction (as per my original example):

4ebb 0004

Going forward, once all the labels have been properly identified in my disassembly, this will be the most user-friendly format to emit.

Using a direct "displacement"

This is where the two differ, and it comes down to whether they treat the provided "displacement" part of the operand as a displacement or as a destination address.

Going back to vasm, assembling:

  org   $80

  jsr   ($04,pc,d0)
  nop

label:
  nop

produces:

Sections:
00: "seg80" (80-88)


Source: "vasm-disp.asm"
                                     1:   org   $80
                                     2: 
00:00000080 4EBB0082                 3:   jsr   ($04,pc,d0)
00:00000084 4E71                     4:   nop
                                     5: 
                                     6: label:
00:00000086 4E71                     7:   nop
                                     8: 


Symbols by name:
label                            A:00000086

Symbols by value:
00000086 label

showing that the provided displacement ($04) is treated as the base target of the operand, and a negative offset (0x82 or -0x7e) is calculated and emitted.

Contrast this with as, where assembling:

  .org  0x80

  jsr   %pc@(0x04,%d0:w)
  nop

label:
  nop

produces:

68K GAS  as-disp.asm            page 1


   1 0000 0000 0000       .org  0x80
   1      0000 0000 
   1      0000 0000 
   1      0000 0000 
   1      0000 0000 
   2                
   3 0080 4EBB 0004       jsr   %pc@(0x04,%d0:w)
   4 0084 4E71            nop
   5                
   6                label:
   7 0086 4E71            nop
68K GAS  as-disp.asm            page 2


DEFINED SYMBOLS
         as-disp.asm:6      .text:0000000000000086 label

NO UNDEFINED SYMBOLS

showing that the provided value (0x04) is considered the displacment, and directly encoded into the output bytes.

In my circumstances, being able to pass the full address of an unresolved label in the operand when using VASM is quite useful when fine-tuning the disassembly algorithm, so this is most likely what I'll be using for now.

Solution

In my opinion, both

  jsr <displacement>(pc,<data register>)

  jsr (<displacement>,pc,<data register>)

is correct syntax. BUT

No one would ever write such code into an assembler. What every assembler expects (monitor programs might be different) is a (relocatable) label instead of a literal number when calculating PC offsets. It would then calculate the numerical displacement from the distance between current PC and the label. You simply confused that mechanism.

You might find that if you use any other address register but PC that your syntax might be accepted. Most assemblers simply don't like literal PC offsets. You should expect something similar with short relative branches, like in

 bra.s -4

EDIT:

My assemblers seem to understand such a construct if the displacement is explicitely marked as a relocatable address by relating it to "*" (the current PC) like in

 jsr *-4(pc,d0.w)

(Didn't try vasm, though)