Search code examples
gcccompilationprogramming-languagescpumachine-code

Does the compiler actually produce Machine Code?


I've been reading that in most cases (like gcc) the compiler reads the source code in a high level language and spits out the corresponding machine code. Now, machine code by definition is the code that a processor can understand directly. So, machine code should be only machine (processor) dependent and OS independent. But this is not the case. Even if 2 different operating systems are running on the same processor, I can not run the same compiled file (.exe for Windows or .out for Linux) on both the Operating Systems.

So, what am I missing? Is the output of a gcc compiler (and most compilers) not Machine Code? Or is Machine Code not the lowest level of code and the OS translated it further to a set of instructions that the processor can execute?


Solution

  • You are confusing a few things. A retargettable compiler like gcc and other generic compilers compile source to object files, then the linker later links objects with other libraries as needed to make a so called binary that the operating system can then read, parse, load the loadable blocks and start execution.

    A sane compiler author will use assembly language as the output of the compiler then the compiler or the user in their makefile calls the assembler which creates the object. This is how gcc works. And how clang works sort of, but llc can make objects directly now not just assembly that gets assembled. (Does a compiler always produce an assembly code?)

    It makes far more sense to generate debuggable assembly language than produce raw machine code. You really need a good reason like JIT to skip the step. I would avoid toolchains that go straight to machine code just because they can, they are harder to maintain and more likely to have bugs or take longer to fix bugs.


    If the architecture is the same there is no reason why you can't have a generic toolchain generate code for incompatible operating systems. The GNU tools for example can do this. Operating system differences are not by definition at the machine code level; most are at the high level language level C libraries that you can use to create GUI windows, etc have nothing to do with the machine code nor the processor architecture, for some operating systems the same operating system specific C code can be used on MIPS or ARM or PowerPC or x86. Where the architecture becomes specific is the mechanism that actual system calls are invoked. A specific instruction is often used. And machine code is eventually used yes but no reason why this can't be coded in real or inline assembly.

    And then this leads to libraries, even fopen and printf which are generic C calls eventually have to make a system call so much of the library support code can be in a compatible across systems high level language, there will need to be a system and architecture specific bit of code for the last mile. You should see this in glibc sources, or hooks into newlib for example in other library solutions. As examples.

    Same is true for other languages like C++ as it is for C. Interpreted languages have additional layers but their virtual machines are just programs that sit on similar layers.

    Low-level programming doesn't mean machine or assembly language, it just means whatever programming language you are using accesses at a lower level, below the application or below the operating system, etc...