How to count number of bytes between instructions

There's this piece of Keil assembly code in Valvano's book (3.3.3 Memory Access Instruction):

; Keil Syntax
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
;outside of execution
PAaddr DCD 0x400043FC

The first line LDR R5, PAaddr gets translated by the assembler to

LDR R5, [PC, #16]

where the #16 represents the number of bytes between the MOV R6, #0x55 and the DCD definition.

I can't understand how the #16 came about. According to Keil's ARM and Thumb Instructions, MOV is a 16-bit instruction (hence 2-bytes). I can't find the instruction size for STR or DCD, but from reading ARM's instruction set summary, STR takes twice as many cycles as MOV's so I would intuitively guess STR's instruction size is double of what MOV is (or 4-bytes). DCD just stores the value to the ROM, so it can't be any bigger than MOV. If I sum up the instruction size in bytes (2 for MOV, 4 for STR, and perhaps a 1 or 2 for DCD), I should get 7 or 8 bytes between the second to the last instruction, or a #7 or #8 jump from PC instead.

Solution

I don't have Kiel handy but doesn't really matter, you didn't provide enough information (what is your target architecture/core) and not all of this is well documented by arm.

So generic thumb

.thumb
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC
Disassembly of section .text:

00000000 <PAaddr-0x8>:
   0:   4d01        ldr r5, [pc, #4]    ; (8 <PAaddr>)
   2:   2655        movs    r6, #85 ; 0x55
   4:   602e        str r6, [r5, #0]
   6:   46c0        nop         ; (mov r8, r8)

00000008 <PAaddr>:
   8:   400043fc    .word   0x400043fc

The immediate offset added to the Align(PC, 4) value of the instruction to form the address. Permitted values are multiples of four in the range 0-1020 for encoding T1.

So ALIGN(0x00+2,4) = 0x04. 0x08 - 4 = 4 = one word. So 1 word 0x4D01 the 01 is the immediate.

.thumb
nop
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC


00000000 <PAaddr-0x8>:
   0:   46c0        nop         ; (mov r8, r8)
   2:   4d01        ldr r5, [pc, #4]    ; (8 <PAaddr>)
   4:   2655        movs    r6, #85 ; 0x55
   6:   602e        str r6, [r5, #0]

00000008 <PAaddr>:
   8:   400043fc    .word   0x400043fc

ALIGN(0x02+2,4) = 0x4. 0x08 - 0x04 = 0x04, one word 0x4D01 encoding.

.cpu cortex-m3
.thumb
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC

Disassembly of section .text:

00000000 <PAaddr-0x8>:
   0:   4d01        ldr r5, [pc, #4]    ; (8 <PAaddr>)
   2:   2655        movs    r6, #85 ; 0x55
   4:   602e        str r6, [r5, #0]
   6:   bf00        nop

00000008 <PAaddr>:
   8:   400043fc    .word   0x400043fc

No change, but

.cpu cortex-m3
.syntax unified
.thumb
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC

Disassembly of section .text:

00000000 <PAaddr-0x8>:
   0:   4d01        ldr r5, [pc, #4]    ; (8 <PAaddr>)
   2:   f04f 0655   mov.w   r6, #85 ; 0x55
   6:   602e        str r6, [r5, #0]

00000008 <PAaddr>:
   8:   400043fc    .word   0x400043fc

and

.cpu cortex-m3
.syntax unified
.thumb
nop
LDR R5, PAaddr
MOV R6, #0x55
STR R6, [R5]
.align
PAaddr: .word 0x400043FC

Disassembly of section .text:

00000000 <PAaddr-0xc>:
   0:   bf00        nop
   2:   4d02        ldr r5, [pc, #8]    ; (c <PAaddr>)
   4:   f04f 0655   mov.w   r6, #85 ; 0x55
   8:   602e        str r6, [r5, #0]
   a:   bf00        nop

0000000c <PAaddr>:
   c:   400043fc    .word   0x400043fc

ALIGN(0x02+2,4) = 0x04. 0x0C-0x04 = 0x08, 2 words, 0x4D02 encoding.

You can do the same things with Kiel's assembly language vs gnu shown above.

It's not your job to count unless you are writing your own assembler (or trying to create your own machine code for some other reason).

In any case simply read the ARM architecture documentation for the architecture in question. Compare that to the output of a debugged assembler for further clarification as needed.

Edit

From the early/original ARM ARM

address = (PC[31:2] << 2) + (immed_8 * 4)
Rd = Memory[address, 4]

this one makes more sense IMO.

When in doubt go back to the old/original-ish ARM ARM.

Most(ish) recent ARM ARM

if ConditionPassed() then
  EncodingSpecificOperations(); NullCheckIfThumbEE(15);
  base = Align(PC,4);
  address = if add then (base + imm32) else (base - imm32);
  data = MemU[address,4];
  if t == 15 then
    if address<1:0> == ‘00’ then LoadWritePC(data); else UNPREDICTABLE;
  elsif UnalignedSupport() || address<1:0> == ‘00’ then
    R[t] = data;
else // Can only apply before ARMv7
  if CurrentInstrSet() == InstrSet_ARM then
    R[t] = ROR(data, 8*UInt(address<1:0>));
  else
    R[t] = bits(32) UNKNOWN;

But that covers T1, T2 and A1 encodings in one shot, making it the most confusing.

In any case, they describe what is going on with the encoding as well as overall size of each of the instructions.