Search code examples
linuxx86disassemblymachine-codeobjdump

What happens if `objdump -d --start-address` starts printing from the middle of an x86 instruction?


... in other words, is x86-64 a uniquely decodable code that, no matter where I start decoding it, always eventually yields the correct disassembly code?

Let's say this (the ground truth) is what part of an ELF file looks like. The first instruction is six bytes long, spanning from 0x5de5a2 to 0x5de5a7:

  ...
  5de5a2:       0f 84 d0 00 00 00       je     5de678 <_ZN16DefNewGeneration22copy_to_survivor_spaceEP7oopDesc.constprop.0+0x138>
  5de5a8:       48 c1 e8 03             shr    rax,0x3
  5de5ac:       83 e0 0f                and    eax,0xf
  ...

If I do objdump -d --start-address=0x5de5a2, the output is indeed correct.

What happens if I intentionally pass in a start address that would "split" the first instruction, for example, let start-address=0x5de524?

This is what I get with:objdump -d --start-adress=0x5de5a4:

  ...
  5de5a4:       d0 00                   rol    BYTE PTR [rax],1
  5de5a6:       00 00                   add    BYTE PTR [rax],al
  5de5a8:       48 c1 e8 03             shr    rax,0x3
  5de5ac:       83 e0 0f                and    eax,0xf
  ...

In this specific case, the remnants of 0x5de5a2: je 5de678 are disassembled into junk, but luckily all code beyond 0x56e5a8 are still correctly decoded.

What I want to ask is, is this a property that I can always rely on? Can I let objdump start disassembling an x86 ELF file at arbitrary start-address in the text section, even if the start address would split an legitimate instruction, but expect objdump to "eventually" print the correct disassembly code?

Extra question: How far can a misaligned start-address impact the correctness of the disassembly? Does this property also hold for other ISAs?


Solution

  • X86 code is not self-synchronizing, if you start a disassembly at an address that is not the proper start of an instruction you may or may not get good code back after some number of bad instructions, but this is not something you can count on).

    In the early days of x86 it was even known for people to write code that depended on the non-self-synchronizing nature of x86 code, there would be jmp instructions that deliberately hit the middle of an instruction and the like.

    There are ISAs that are self-synchronizing, particularly machines that have fixed-length instructions are self-synchronizing pretty much by definition. I am not aware of machines that have both variable length instructions the way x86 does and are self-synchronizing, but that is a question I am by no means an expert on.