Search code examples
c++compiler-constructionlinkermachine-translation

What jobs does a typical C++ compiler handle?


After researching a bit on compilers and how they work I learned that the process is often broken up into 4 steps: Preprocessor, Compiler, Assembler and Linker. The way I envisioned these steps was each being it's own separate program; A preprocessor program, a compiler program, an assembler program and a linker program. However, you learn that sometimes the process of creating assembly code and generating object files is all handled by the compiler program and sometimes its not. It seems to depend very much on the context and programming language used. My question is then how is the typical translation process broken up for translating C++ source code into machine code?

  1. Is the preprocessor a separate program from the compiler? Or is that process usually a part of the compiler program?
  2. What is the compiler typically responsible for? Generating assembly code and then conversion to machine code?
  3. Is the linker it's own separate program that is run after the compiler finishes?

Side note: My question is different from other C++ compiler threads because I'm asking not only how a compiler works but if certain other processes, such as linking, are there own executable programs or if they are typically built into a compiler program.


Solution

  • All of the modern compilers (at least gcc and clang, but I doubt others are much different) have preprocessing and compiler as one executable. This is mainly because the compiler wants to be able to generate good error messages [that point to the right line and column, and when it's macros involved, can say "Called from macro FOO(x)"], and understanding "what file we're in" is easier when the compiler has the actual source-code to look at, rather than pre-processed code.

    The linker is typically a separate program, and assembler is only used for inline assembly code [typically as an integrated part of the compiler] - otherwise, the compiler will generate machine-code directly without using the assembler [at least in LLVM, which is the compiler I know best]. So out of the compiler comes a fully formed object file.

    If you have the correct options, the linker will be called, but is a separate executable, which will link the object file together with the runtime library and start-code "before main" (global object construction, and similar, as well as "preparing to call main"). This will produce the executable file.

    With other options, the compiler will produce just an object file, or a disassembly of the machine code generated in symbolic form (the -S option).

    The backend part of the compiler, which is responsible for code-generation, also typically deals with the optimisation and various code-transformations to help the optimisation stages - for example Clang + LLVM will produce "uniform" loops, no matter if you used while, for or goto to make a loop.

    This helps the more advanced stages to not have to identify many different forms of loops, and allows the compiler to generate "good" code regardless of how the programmer formed the loop. [Of course, if you make it complicated enough, the compiler will probably not quite figure out how your loop works, and not optimise quite so well, but for straightforward conversion between the basic forms, it will do the same final code-generation regardless of what the source looked like].