Search code examples
cassemblyx86watcom

How to make Watcom C compiler (wcc) and WASM generate the same 8086 machine code?


For this C source code:

int add(int a, int b) { return a + b; }

, the Watcom C Compiler for 8086 (wcc -s -ms -os -0 prog.c) generates the following machine code (hex): 01 D0 C3, disassembling to add ax, dx (01 D0) + ret (C3).

For this assembly source code:

PUBLIC  add_
EXTRN   _small_code_:BYTE
_TEXT SEGMENT BYTE PUBLIC USE16 'CODE'
add_: add ax, dx
      ret
_TEXT ENDS
END

, the Watcom Assembler (WASM, wasm -ms -0 prog.wasm) generates the following machine code (hex): 03 C2 C3, disassembling to add ax, dx (03 C2) + ret (C3).

Thus they generate a different binary encoding of the same 8086 assembly instruction add ax, dx.

FYI If I implement the the function in Watcom C inline assembly, then the machine code output will be the same as with WASM.

A collection of different instruction encodings:

  • add ax, dx. wcc: 01 D0; wasm: 03 C2.
  • mov bx, ax. wcc: 89 C3; wasm: 8B D8.
  • add ax, byte 9. wcc: 05 09 00; wasm: 83 C0 09.

How can I make the Watcom C compiler (for C code) and WASM generate the instructions with the same binary encoding? Is there a command-line flag or some other configuration option for either? I wasn't able to find any.

The reason why I need it is that I'd like to reproduce an executable program file written in Watcom C by writing WASM source only, and I want the final output be bit-by-bit identical to the original.


Solution

  • This answer is inspired by a comment by @RaymondChen.

    Here is a cumbersome, multistep way to change the machine code emitted by wcc to match the output of wasm:

    1. Compile the C source code witm wcc (part of OpenWatcom) to .obj file as usual.

    2. Use dmpobj (part of OpenWatcom) to extract the machine code bytes of the _TEXT segment.

    3. Use ndisasm (part of NASM, ndisasm -b 16 file.obj) to disassemble the machine code bytes.

    4. Write and run custom source text filter to keep the assembly instructions only and convert them WASM syntax.

    5. Use wasm (part of OpenWatcom) to generate the 2nd .obj file.

    6. Use dmpobj to extract the machine code bytes of the _TEXT segment of the 2nd .obj file.

    7. Write and run a custom binary filter to replace the machine code bytes in the _TEXT segment of the 1st .obj file from the equivalent bytes extracted from the 2nd .obj file, using the offsets in the outputs of the dmpobj invocations.

    These steps avoid using wdis -a (conversion from .obj to assembly source), because that's lossy (it doesn't include everything in the .obj file), which can potentially make unwanted changes, causing problems later.