I took an interest in finding out how a compiler really works. I looked through several books and all of them agree on the fact that the compiler phases are roughly as this(correct me if I'm wrong): lexical analysis, syntax analysis, semantic analysis, intermediate code, code optimization, code generation. The lexical and syntax phases look pretty clear and straightforward as methods(but this does not mean easy of course). However, I'm still not able to find what the semantic phase really consist of. For one, I know that there should be some subphases like scope checking, declaration checking and type checking but question that has been bothering me is: are there other things that have to be done. Can you tell me what are the mandatory steps that have to taken during this phase. I know this strongly depends on the programming language and the compiler implementation but could you give me some examples concerning C/C++, Java. And could you please point me to a book/page/article where can I read those things in-depth. Thanks.
Edit: The books I look through were "Compilers: Principles, Techniques, and Tools",Aho and "Modern Compiler Design", Grune, Reeuwijk. I haven't been able to answer this question using them. If you find this question too broad could you please give an answer considering an compiler implementation of your choice for either C,C++ or Java.
There are typical "semantic analysis" phases that many compilers go through in one form or another. After lexing and parsing, the following actions typically occur in this order:
Name and type resolution. Determines lexical scopes, identifiers declared in such scopes, the type information for those identifiers, and for each non-declaration use of an identifier, the declaration to which it refers
Control flow analysis. The construction of a control flow graph over the computations explicit and/or implied (e.g., constructors) by the code.
Data flow analysis. Determines where variables recieve new values, and where those values are read by other parts of the program. (This often has a local analysis done within procedures, followed possibly by one across the procedures).
Also often done, as part of data flow analysis:
Points-to analysis. Determination for each pointer, at each location in the code, which entities that pointer might reference
Call graph. Construction of a call graph across the procedures, often taking into account indirect function pointers whose estimated values occur during the points-to analysis.
As a practical matter, some of these need to be interleaved to produce better results.
Beyond this, there are many analyses used to support various optimizations and code generation passes. If you really want to know more, consult any decent compiler book.