Search code examples
compiler-constructionruntimelexical-analysis

How does compiler handle line number in runtime error message


Almost all compiler will return a line number along with error message. I am wondering in compiler design perspective how does compiler handle line number message in terms of following different phases? thanks.

  • Scanner
  • Parser
  • AST data structure
  • Code generation

In additional:

  • Run time environment
  • Machine interpreter

Solution

  • I have implemented a fairly simple compiler for my class assignment. It was a subset of Pascal with a few other limitations.

    Compiler is a tool that translates one language into another. It does that by performing error checking and then generates output code (if possible). Usually, the pipeline of a compiler is roughly equivalent to:

    Input Code -> Lexical Analyzer(Scanner) -> Syntax Analyzer -> Semantic Analyzer -> Code Generator -> Output Code *

    Since my language was simple I could make a bunch of assumptions e.g. an instruction will be only in one line. My Lexer used regular expressions to check for character that shouldn't be there e.g. "Characters that are not numbers, letters, "(", "," "." etc." I read file into a list of strings where each string is a next line, so If I scan a line and find an error I return the index + 1 which is the number of the line.

    With the Syntax Analyzer (parser) that checks e.g. "If a variable name starts with a letter" the algorithm was similar.

    When I extended the parser I associated a token with the line in the code to return it in case of a error.

    I don't know how modern compilers solve this problem but I can guess that is also some kind of association of AST and line number with mind that one AST could be in a few lines (well, that's language dependent).

    With code generation compiler knows that the code is correct(to their knowledge) and the return error wouldn't be about the code but rather that there is a problem with the compiler or the process(bug, not enough memory, cannot write to the location etc.).

    Runtime environments and machine interprets also could have compiler e.g. JIT but the error message return usually indicates bug in compiler or runtime, not the code.

    *Note that this is very simple model of 3 passes. Modern compilers have a lot more.

    EDIT: I found that AST have a field that indicates line number and a files for an error(for each node).