Search code examples
x86low-levelmachine-code

How can I write raw machine code for x86 without using assembly?


I'd like to be able to write raw machine code, without assembly or any other sort of higher level language, that can be put directly onto a flash drive and run. I already know that for this to work, I need to format master boot record headers (which I have managed to do manually) onto the drive. I have completed this, and successfully been able to get a line of text to display on the screen using assembly code in the first sector (in this case, the first 512 bytes) of the drive my code is on. However, I would like to be able to write raw hex code onto the drive, like I did for MBR formatting, without any sort of tool like assembly to help me. I know that there is a way to do this, but I haven't really been able to find anything that doesn't mention assembly. Where can I find information about this? Googling machine code or x86 programming comes up with assembly, which isn't what I want.


Solution

  • If what you really want is to understand x86 machine code better, I'd recommend you start by looking at the output of an assembler to see what bytes it assembled into the output file for each line of asm source.

    nasm -fbin -l listing.txt foo.asm will gives you a listing that includes the raw hex bytes and the source line, or nasm -fbin -l/dev/stdout foo.asm | less pipes the listing right into a text-viewer. See this chroma-key blend function in 13 bytes of x86 machine code I wrote on codegolf.SE for an example of what the output looks like.

    You can also disassemble a binary file after creating it normally. ndisasm works on flat binaries, and produces the same format of hex bytes + asm instruction. Other disassemblers like objdump are also usable: Disassembling A Flat Binary File Using objdump.

    Semi-related: How to turn hex code into x86 instructions


    Intel's x86 manuals fully specify how instructions are encoded: See the vol.2 insn set reference manual, Chapter 2 INSTRUCTION FORMAT for a breakdown of prefixes, opcodes, ModR/M + optional SIB and optional displacement, and immediate.

    Given that, you can read the per-instruction documentation on how to encode it, like that D1 /4 (shl r/m32, 1) means the opcode byte is D1, and the /r field of ModRM must be 4. (The /r field works as 3 additional opcode bits for some instructions.)

    There's also an appendix mapping opcode-bytes back to instructions, and other sections in that manual.

    You can of course use a hex editor to type in the encodings you work out manually to create a 512-byte binary file without using an assembler. But this is a pointless exercise.


    See also tips for golfing in x86 machine code for a lot of quirks of x86 instruction encoding: e.g. there are single-byte encodings for inc/dec a full register (except in 64-bit mode). It's of course focused on instruction length, but unless you insist on looking up the actual encodings yourself, the interesting part is which forms of instructions have different or special encodings available. Several of the answers on that tips Q&A have output from objdump -d showing machine-code bytes and AT&T syntax disassembly.