Search code examples
assemblyobjdump

Trying to assemble the output of an disassembler (such as objdump)


Possible Duplicate:
Disassembling, modifying and then reassembling a Linux executable

I've been told that assembly and dissassembly are not inverses. Apparently, you can't dissassemble a program, put that output directly into an assembler, and expect it to run correctly because information is lost.

My question is, why is information lost? Also, what information is lost?


Solution

  • One important thing that disassemblers (or their users) routinely do not preserve is the instruction encoding. Some instructions can be encoded in multiple different ways, e.g.:

    mov rdx, -1 is either 48,BA,FF,FF,FF,FF,FF,FF,FF,FF (10 bytes) or 48,C7,C2,FF,FF,FF,FF (7 bytes).

    If the rest of the program somehow functionally depends on the length of the above instruction being exactly 10 (or 7) bytes or on those specific byte values and the assembler chooses to assemble mov rdx, -1 differently from what it was in the original program, then after disassembly+assembly you get a different program that will work differently. For instructions with ambiguous encoding the assembler must use not the instruction mnemonic (mov rdx, -1) but its exact encoding in the disassembly of the original program (e.g. 48,BA,FF,FF,FF,FF,FF,FF,FF,FF).

    There may be other things that the assembler or linker may do differently (e.g. do additional aligning of code/data, name and order things (sections/segments) differently in the output file), which usually aren't a problem, but, again, if there're some unusual dependencies on these things in the original program, then, the reassembled program will work differently.