Search code examples
dosx86-16portable-executablerelocationmemory-segmentation

DOS inserting segment addresses at runtime


I noticed a potential bug in some code i'm writing.

I though that if I used mov ax, seg segment_name, the program might be non-portable and only work on one machine in a specific configuration since the load location can vary from machine to machine.

So I decided to disassemble a program containing just that one instruction on two different machines running DOS and I found that the problem was magically solved.

Output of debug on machine one: 0C7A:014C B8BB0C MOV AX,0CBB

Output of debug on machine two: 06CA:014C B80B07 MOV AX,070B

After hex dumping the program I found that the unaltered bytes are actually B84200.

Manually inserting those bytes back into the program results in mov ax, 0042

So does the PE format store references to those instructions and update them at runtime?


Solution

  • As Peter Cordes noted, MS-DOS doesn't use the PECOFF executable format that Windows uses. It has it's own "MZ" executable format, named after the first two bytes of the executable that identify as being in this format.

    The MZ format supports the use of multiple segments through a relocation table containing relocations. These relocations are just simple segment:offset values that indicate the location of 16-bit segment values that need to be adjusted based on where the executable was loaded in memory. MS-DOS performs these adjustments by simply adding the actual load segment of the program to the value contained in the executable. This means that without relocations applied the executable would only work if loaded at segment 0, which happens to be impossible.

    Note this isn't just necessary for a program to work on multiple machines, it's also necessary for the same program to work reliably on the same machine. The load address can change based on what various configuration details, was well as other programs and drivers that have already been loaded in memory, so the load address of an MS-DOS executable is essentially unpredictable.

    Working backwards from your example, we can tell where your example program was loaded into memory on both machines. Since 0042h was relocated into 0CBBh on the first machine and into 070Bh on the second machine, we know MS-DOS loaded your program on the two machines at segments 0C79h and 06C9h respectively:

    0CBB - 0042 = 0C79
    070B - 0042 = 06C9
    

    From that we can determine that your example executable has the entry 0001:014D, or equivalent segment:offset value, in it's relocation table:

    0C7A:014D - 0C79:0000 = 0001:014D
    06CA:014D - 06C9:0000 = 0001:014D
    

    This entry indicates the unrelocated location of the 16-bit immediate operand of the mov ax, seg segname instruction that needs adjusting.