Search code examples
parsingsyntaxcompiler-constructionsemanticsrecursive-descent

Syntax analysis and semantic analysis


I am wondering how the syntax analysis and semantic analysis work.

I have finished the lexer and the grammar construction of my interpreter.

Now I am going to implement a recursive descent (top down) parser for this grammar

For example, I have the following grammar:

<declaration>  ::=   <data_type> <identifier> ASSIGN <value>

so i coded it like this (in java):

public void declaration(){
    data_type();
    identifier();
    if(token.equals("ASSIGN")){
        lexer();   //calls next token
        value();
    } else {
        error();
    }
}

Assuming I have three data types: Int, String and Boolean. Since the values for each data types are different, (ex. true or false only in Boolean) how can I determine if it fits the data type correctly? What part of my code would determine that?

I am wondering where would I put the code to:

1.) call the semantic analysis part of my program. 
2.) store my variables into the symbol table.

Do syntax analysis and semantic analysis happen at the same time? or do i need to finish the syntax analysis first, then do the semantic analysis?

I am really confused. Please help.

Thank you.


Solution

  • You can do syntax analysis (parsing) and semantic analysis (e.g., checking for agreement between <data_type> and <value>) at the same time. For example, when declaration() calls data_type(), the latter could return something (call it DT) that indicates whether the declared type is Int, String or Boolean. Similarly, value() could return something (VT) that indicates the type of the parsed. Then declaration() would simply compare DT and VT, and raise an error if they don't match. (Alternatively, value() could take a parameter indicating the declared type, and it could do the check.)

    However, you may find it easier to completely separate the two phases. To do so, you would typically have the parsing phase build a parse tree (or abstract syntax tree). So your top level would call (e.g.) program() to parse a whole program, which would return a tree representing (the syntax of) the program, and you would pass that tree to a semantic_analysis() routine, which would traverse the tree, extracting pertinent information and enforcing semantic constraints.