Search code examples
assemblyoperating-systemcpu-architecturelow-levelmachine-code

Do computer programs/OSes consist of only the X86-64 instructions at low level?


I am sorry for a newbie/stupid question, but this has bothered me some time and a straight up answer seems difficult to find. The question is about how computers work at a low level - more specifically whether there are commands the computer can execute that are NOT included in the x86-64 instructions. Put differently you could ask is an OS programmed only by using the x86-64 instructions, and the same for the programs the OS runs. Note that I am not asking about hidden commands or additional commands specific to a processor, we may assume those do not exist.

Motivation for the question:

  • The account that is often given is that the compiler complies a program of specific language to machine code. But, there are many commands that can not be (to my knowledge) written in assembly with only the x86-64 instructions. Even something simple like "malloc". So it appears actual programs written for an OS consist of machine code and OS instructions?

  • If the x86-64 instruction set is looked upon it seems that the I/O commands such as access to keyboard, mouse, hard-drive, GPU, audio interface, time, monitor, speakers etc. does not all have commands for it, although the INT command can be used to accomplish some of the tasks. According to this answer "On modern architectures, peripherals are accessed in a similar way to memory: via mapped memory addresses on a bus.", whatever that means in terms of code. So it appears even the OS is not written only in x86-64 instructions?


Solution

  • Yes, CPUs can only run machine code (which you can 1:1 represent via asm). For some languages, ahead-of-time compilers turn source into machine code in an executable.

    For others, e.g. Java, it's typical to JIT-compile to machine code in a buffer in memory on the fly, then call it. (The code that does the JIT compiling was originally written in C, but was compiled ahead-of-time to machine code in the java executable itself).

    In other language implementations, you just have an interpreter: it's a program (normally written in an ahead-of-time compiled language like C or C++) that reads a file (e.g. a bash or python script) and parses it, deciding which of its existing functions to call with what args based on the contents of the file. Every instruction that runs was originally in the binary, but there are conditional branches in that interpreter code that depend on the high-level-language code in the file you ran it on.


    malloc isn't a fundamental operation, it's a library function (compiled to machine code) which might make some system calls (involving running some machine code in the kernel).

    With a full-system emulator like BOCHS, you can literally single-step machine instructions through any program, into system calls, and even for interrupt handlers. You will never find the CPU executing anything that isn't machine code instructions; that's literally the only thing its logic circuits know how to decode after fetching from memory. (Being able to be decoded by the CPU is what makes it machine code).

    Machine code always consists of a sequence of instruction, and every ISA has an assembly language that we can use for human-readable representations of machine code. (related: Why do we even need assembler when we have compiler? re: the existence of assembly language instead of just machine code).

    Also, the instruction format any given ISA is at least somewhat consistent. On x86-64 it's a byte-stream of opcode, operands (modrm + optional other bytes), and optional immediate. (Also prefixes... x86-64 is kind of a mess.) On AArch64, machine instructions are fixed-width 4 bytes, aligned on 4-byte boundaries.

    "On modern architectures, peripherals are accessed in a similar way to memory: via mapped memory addresses on a bus."

    That means executing a store instruction like x86-64 mov [rdi], eax to store 4 bytes into memory at address=RDI. Logic inside the CPU (or northbridge in older systems) decides whether a given physical address is DRAM or I/O based on the address, rather than based on the instruction.

    Or x86-64 has instructions to access I/O space (separate from memory space), like in and out.


    Re: New title:

    Do computer programs/OSes consist of only the x86-64 instructions at low level?

    No, most programs and OSes also contain some static read-write data (.data) and read-only constants (an .rodata section), instead of purely code with constants only as immediate operands.

    But of course data doesn't "run", so maybe that's not what you meant. So yes, unless you want to play semantics with firmware.

    Drivers for some modern I/O devices need firmware binary blobs (part of which is machine code for the microcontroller embedded in the GPU, sound card, or whatever).

    From the OS's point of view, this is just binary data that it has to send to a PCIe device before it will respond to MMIO operations the way its documentation says it will. It doesn't matter to the OS how the non-CPU device uses that data internally, whether it's actually instructions for a microcontroller or whether it's just lookup tables and samples for a sound card's MIDI synthesizer.