Search code examples
parsingjavacc

Parser example of JavaCC for arithmetic expressions


It is my first day in parsing area. As my first example in JavaCC, the code is

SKIP:  { " " | "\t" | "\n" | "\r"                    }
TOKEN: { "(" | ")" | "+" | "*" | <NUM: (["0"-"9"])+> }

void S(): {} { E() <EOF>           }
void E(): {} { T() ("+" T())*      }
void T(): {} { F() ("*" F())*      }
void F(): {} { <NUM> | "(" E() ")" }

I wonder why it can handle the case like int+(int+int). By its algorithm, I think it would parse this expression as [int] & [(int] & [int)] as Ts and fail to parse. Why it got parsed as intended?


Solution

  • Consider the input string "1+(2+3)" this lexes as a sequence of tokens

    <NUM> "+" "(" <NUM> "+" <NUM> ")" <EOF>
    

    This sequence can be derived from S() as follows. The . shows which tokens have been consumed. As tokens are consumed, the . moves right

             . S() 
               ~~~ expand
         ==> . E() <EOF>
               ~~~ expand
         ==> . T() ("+" T())* <EOF>
               ~~~ expand
         ==> . F() ("*" F())* ("+" T())* <EOF>
               ~~~ expand
         ==> . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* <EOF>
               ~~~~~~~~~~~~~~~~~~~~~ choose first and consume
         ==> <NUM> . ("*" F())* ("+" T())* <EOF>
                     ~~~~~~~~~~ delete
         ==> <NUM> . ("+" T())* <EOF>
                     ~~~~~~~~~~ unroll and consume
         ==> <NUM> "+" . T() ("+" T())* <EOF>
                         ~~~ expand
         ==> <NUM> "+" . F() ("*" F())* ("+" T())* <EOF>
                         ~~~~
         ==> <NUM> "+" . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* <EOF>
                         ~~~~~~~~~~~~~~~~~~~~~ choose second and consume
         ==> <NUM> "+" "(" . E() ")" ("*" F())* ("+" T())* <EOF>
                             ~~ expand
         ==> <NUM> "+" "(" . T() ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                             ~~~ expand
         ==> <NUM> "+" "(" . F() ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                             ~~~ expand
         ==> <NUM> "+" "(" . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                             ~~~~~~~~~~~~~~~~~~~~~ choose first and consume
         ==> <NUM> "+" "(" <NUM> . ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                   ~~~~~~~~~~ delete
         ==> <NUM> "+" "(" <NUM> . ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                   ~~~~~~~~~~~~~~ unroll and consume
         ==> <NUM> "+" "(" <NUM> "+" . T() ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                       ~~~ expand
         ==> <NUM> "+" "(" <NUM> "+" . F() ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                       ~~~ expand
         ==> <NUM> "+" "(" <NUM> "+" . (<NUM> | "(" E() ")") ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                       ~~~~~~~~~~~~~~~~~~~~~ choose first and consume
         ==> <NUM> "+" "(" <NUM> "+" <NUM> . ("*" F())* ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                             ~~~~~~~~~~ delete
         ==> <NUM> "+" "(" <NUM> "+" <NUM> . ("+" T())* ")" ("*" F())* ("+" T())* <EOF>
                                             ~~~~~~~~~~ delete and consume
         ==> <NUM> "+" "(" <NUM> "+" <NUM> ")" . ("*" F())* ("+" T())* <EOF>
                                                 ~~~~~~~~~~ delete
         ==> <NUM> "+" "(" <NUM> "+" <NUM> ")" . ("+" T())* <EOF>
                                                 ~~~~~~~~~~ delete and consume
         ==> <NUM> "+" "(" <NUM> "+" <NUM> ")" <EOF> .
    

    Key:

    • expand: Replace a nonterminal with its definition.
    • choose: Replace (S|T) with either S or T
    • unroll: Replace (S)* with S (S)*
    • delete: Replace (S)* with nothing

    The derivation above is a left-right derivation. I choose to show the left-right derivation because it's reflective of how JavaCC works.