Search code examples
assemblycompiler-constructioncpu-architecturelow-level

Do modern compilers need assembler?


If I am not wrong, all compilers have an assembler(ml, as, etc..) and they use it to translate high-level code into machine language in the background(c/c++ code -> asm code -> machine code). But I wonder the modern compilers work like that too or compile the high-level source code into direct machine code? So in short, does MSVC using ml.exe or GCC using ./as in the background?


Solution

  • It varies.

    • gcc does use the external as program. Not "in the background", but as a separate pass operating on a temporary .s file written by the compiler. Or, if you use the -pipe option, in a pipeline. You can see the as command that is run if you compile with gcc -v.

    • clang has an "integrated assembler" which is used by default instead of as. However, if you switch it off with -fno-integrated-as, then it will run as separately and you can see this in clang -v output.

    • I believe that MSVC does not use a separate assembler, but I am not certain of this.

    Note that if a compiler is going to support inline asm (as gcc and clang both do), then it can't very easily skip an assembler pass completely. Some stage of the process still has to know how to assemble every instruction mnemonic into machine code. In some cases, inline asm might expect to be able to interact with asm defined elsewhere in the file, and this is hard to support unless you have a pass where you truly generate the entire module into assembly, or at least into some pre-parsed asm-equivalent internal representation.

    MSVC does not support inline assembly on x64, so it would not have this issue. Indeed, this might have been part of the reason not to support it.

    So it really just comes down to a design decision. There are some benefits to compiling directly to machine code:

    • better compilation performance,

    • it might make certain micro-optimizations easier

    and some benefits to an external assembler:

    • avoids reinventing the wheel, if the system already has a working assembler

    • separation of concerns: the compiler doesn't have to know anything about machine code or object file format, the assembler doesn't have to know anything about the compiler's IR

    • easier to ensure 100% compatibility with code written for the existing assembler. For instance, clang occasionally has issues building source written for gcc/gas if it contains inline asm using obscure gas features, since the clang integrated assembler doesn't always support them compatibly.