I am trying to understand compiling process in terms of a simple c++ program as follows:
#include<iostream.h>
#include<conio.h>
#include<math.h>
#define sum(a,b) a+b
int a;
void main()
{
int b;
cin>>a>>b;
cout<<"hello"<<endl;
cout<<pow(a,2)<<endl;
cout<<sum(a,b);
getch();
}
My understanding so far:
1) Preprocessing: All macros are expanded and expressions are substituted. eg: sum(a,b). Function prototypes of all functions we are using the program, are added to the code. eg: pow() function from math.h
2) Compiling: The preprocessed code is converted to assembly code and then into a single object code(this is in machine language).
3) Linking: Decides how the memory should be allocated to various sections of the code - Global (int a) and local variables (int b).
In case of static linking, Function definitions from various header files are added to the code too. eg: Definition of pow() from math.h. Finally one standalone single executable file is generated.
In case of dynamic linking, function definitions are not added. Finally one single executable file is generated, but it is not standalone.
Is my understanding wrong ? What am I missing ?
This is a very broad question in general but i'll try to answer as briefly as possible. A typical language processing system has the following phases :
1. Preprocessing Phase - In this phase all preprocessors and macros are handled and code is generated which is free from these. This involves replacing macro calls with macro body and replacing the formal parameters with the actual parameter.
2. Compilation Phase - This has several smaller phases such as: Lexical Analysis , Syntax Analysis , Semantic Analysis , Intermediate code generation , code optimization , target code generation , etc. The Compilation phase may/may not produce assembly code. There are separate pros and cons of both the approaches. We will assume that assembly code was produced in this discussion.
3. Assembly Phase - The assembler converts the output of compiler to target code . Assemblers can be one pass or two pass in nature.
4. Linking Phase - The code that has been produced has many references and calls to subroutines which are defined in other modules. Such modules are linked to the code in this phase and the addresses are assigned to such instructions which have outside references.
5. Loading Phase - In this phase , all the segments which are produced in the previous phase get loaded into the RAM for actual execution and control is passed to the first instruction.
All components listed in this answer have many intricacies and sub-parts and in no way are a complete explanation of a language processor.
There are books such by authors DM Dhamdere , Tannenbaum and Alfred Aho on these topics which are useful.