Search code examples
compilationcompiler-errorscompiler-construction

Error detection in compilers


I am learning about compilers, specifically looking at 2-phase compilers, and am confused on the certain phases where errors are detected. Let's say we have something like:

int x, y;

x = x + y + z;

Where we are trying to reference a variable that has not been declared. I think this is an error that would be detected in the front-end of the compiler. But I don't know which sub-area of the front-end would detect this error.

The three parts to the front-end are: the scanner, the parser, and the context-free analyzer. The scanner reads every single character in a statement and splits the statement up into tokens. So, I could be wrong, but I don't think the error would be detected here. The parser checks to see if the statement is syntactically correct. Here is where I start to get confused. Even though z is undeclared, the syntax of the statement is technically correct, so would the error not be detected here either? The context-free analyzer uses the symbol table and syntax tree to check the program to see if it is semantically consistent with the language definition. Here it also does type checking. Would it be here that the error would be detected? Because at this point the compiler would look in the symbol table and notice that z doesn't have a type (or that it's not in there at all?). Or is this something that would be detected by the back-end of the compiler? If it is the back-end, I don't understand why that is the case. Any clarification would be highly appreciated. Thanks.


Solution

  • This is ultimately compiler-dependent, but typically this would come up at the semantic analysis level, which is still in the compiler front-end.

    With a traditional compiler, this couldn't be done in the scanning phase because scanners use finite automata and the language of "strings that represent proper variable scoping" isn't regular. This also typically wouldn't be done as part of parsing, since parsing usually is about building up an AST and, if it were done bottom-up, the scoping information wouldn't be available at the time that the parser determined the structure of the code.

    However, the semantic analyzer has all the information necessary to find this error - it has the AST and can use that to build a symbol table, walk through all the expressions in the code, and notice that z isn't anywhere in that symbol table.