Search code examples
javaparsingantlrparser-generatorjavacc

JavaCC Problem - Generated code doesn't find all parse errors


Just started with JavaCC. But I have a strange behaviour with it. I want to verify input int the form of tokens (letters and numbers) wich are concatenated with signs (+, -, /) and wich can contain parenthesis. I hope that was understandable :)

In the main method is a string, which should produce an error, because it has one opening but two closing parenthesis, but I do not get a parse exception --> Why?

Does anybody have a clue why I don't get the exception?

I was struggling with left recursion and choice conflicts with my initial try, but managed to get over them. Maybe there I introduced the problem?!

Oh - and maybe my solution is not very good - ignore this fact... or better, give some advice ;-)

File: CodeParser.jj

 options {
   STATIC=false;
 }

 PARSER_BEGIN(CodeParser)

 package com.testing;

 import java.io.StringReader;
 import java.io.Reader;

 public class CodeParser {

     public CodeParser(String s) 
     {
         this((Reader)(new StringReader(s))); 

     }

     public static void main(String args[])
     {
         try
         {
               /** String has one open, but two closing parenthesis --> should produce parse error */
               String s = "A+BC+-(2XXL+A/-B))";
               CodeParser parser = new CodeParser(s);
               parser.expression();
         }
         catch(Exception e)
         {
               e.printStackTrace();
         }
     }
 }
 PARSER_END(CodeParser)

 TOKEN:
 {
  <code : ("-")?(["A"-"Z", "0"-"9"])+ >
  | <op : ("+"|"/") >
  | <not : ("-") >
  | <lparenthesis : ("(") >
  | <rparenthesis : (")") >
 }

 void expression() :
 {
 }
 {
  negated_expression() | parenthesis_expression() | LOOKAHEAD(2) operator_expression() | <code>
 }

 void negated_expression() :
 {
 }
 {
       <not>parenthesis_expression()
 }

 void parenthesis_expression() :
 {
 }
 {
        <lparenthesis>expression()<rparenthesis>
 }

 void operator_expression() :
 {
 }
 {
       <code><op>expression()
 }

Edit - 11/16/2009

Now I gave ANTLR a try.

I changed some terms to better match my problem domain. I came up with the following code (using the answers on this site), which seems to do the work now:

grammar Code;

CODE    :   ('A'..'Z'|'0'..'9')+;
OP  :   '+'|'/';

start   :   terms EOF;
terms   :   term (OP term)*;
term    :   '-'? CODE
    |   '-'? '(' terms ')';

And by the way... ANTLRWORKS is a great tool for debugging/visualizing! Helped me a lot.

Additional info
Above code matches stuff like:

(-Z19+-Z07+((FV+((M005+(M272/M276))/((M278/M273/M642)+-M005)))/(FW+(M005+(M273/M278/M642)))))+(-Z19+-Z07+((FV+((M005+(M272/M276))/((M278/M273/M642/M651)+-M005)))/(FW+(M0))))

Solution

  • What kgregory says is the right answer. You can see this if you build the grammar with the DEBUG_PARSER option and then run it:

    $ javacc -debug_parser -output_directory=com/testing/ CodeParser.jj && javac com/testing/*.java && java -cp . com.testing.CodeParser
    Java Compiler Compiler Version 5.0 (Parser Generator)
    (type "javacc" with no arguments for help)
    Reading from file CodeParser.jj . . .
    File "TokenMgrError.java" is being rebuilt.
    File "ParseException.java" is being rebuilt.
    File "Token.java" is being rebuilt.
    File "SimpleCharStream.java" is being rebuilt.
    Parser generated successfully.
    Call:   expression
      Call:   operator_expression
        Consumed token: <<code>: "A" at line 1 column 1>
        Consumed token: <<op>: "+" at line 1 column 2>
        Call:   expression
          Call:   operator_expression
            Consumed token: <<code>: "BC" at line 1 column 3>
            Consumed token: <<op>: "+" at line 1 column 5>
            Call:   expression
              Call:   negated_expression
                Consumed token: <"-" at line 1 column 6>
                Call:   parenthesis_expression
                  Consumed token: <"(" at line 1 column 7>
                  Call:   expression
                    Call:   operator_expression
                      Consumed token: <<code>: "2XXL" at line 1 column 8>
                      Consumed token: <<op>: "+" at line 1 column 12>
                      Call:   expression
                        Call:   operator_expression
                          Consumed token: <<code>: "A" at line 1 column 13>
                          Consumed token: <<op>: "/" at line 1 column 14>
                          Call:   expression
                            Consumed token: <<code>: "-B" at line 1 column 15>
                          Return: expression
                        Return: operator_expression
                      Return: expression
                    Return: operator_expression
                  Return: expression
                  Consumed token: <")" at line 1 column 17>
                Return: parenthesis_expression
              Return: negated_expression
            Return: expression
          Return: operator_expression
        Return: expression
      Return: operator_expression
    Return: expression
    

    See that? The last token consumed is the second to last character - the second to last right parenthesis.

    If you want the exception, again, like kgregory said, you could add a new top-level production called "file" or "data" or something and end it with an token. That way any dangling parens like this would cause an error. Here's an grammar that does that:

    options {
      STATIC=false;
    }
    
    PARSER_BEGIN(CodeParser)
    package com.testing;
    
    import java.io.StringReader;
    import java.io.Reader;
    
    public class CodeParser {
    
        public CodeParser(String s) 
        {
            this((Reader)(new StringReader(s))); 
    
        }
    
        public static void main(String args[])
        {
            try
            {
                  /** String has one open, but two closing parenthesis --> should produce parse error */
                  String s = "A+BC+-(2XXL+A/-B))";
                  CodeParser parser = new CodeParser(s);
                  parser.file();
            }
            catch(Exception e)
            {
                  e.printStackTrace();
            }
        }
    }
    PARSER_END(CodeParser)
    
    TOKEN:
    {
            <code : ("-")?(["A"-"Z", "0"-"9"])+ >
            | <op : ("+"|"/") >
            | <not : ("-") >
            | <lparenthesis : ("(") >
            | <rparenthesis : (")") >
    }
    
    void file() : {} {
      expression() <EOF>
    }
    void expression() :
    {
    }
    {
            negated_expression() | parenthesis_expression() | LOOKAHEAD(2) operator_expression() | <code>
    }
    
    void negated_expression() :
    {
    }
    {
          <not>parenthesis_expression()
    }
    
    void parenthesis_expression() :
    {
    }
    {
           <lparenthesis>expression()<rparenthesis>
    }
    
    void operator_expression() :
    {
    }
    {
          <code><op>expression()
    }
    

    And a sample run:

    $ javacc -debug_parser -output_directory=com/testing/ CodeParser.jj && javac com/testing/*.java && java -cp . com.testing.CodeParser
    Java Compiler Compiler Version 5.0 (Parser Generator)
    (type "javacc" with no arguments for help)
    Reading from file CodeParser.jj . . .
    File "TokenMgrError.java" is being rebuilt.
    File "ParseException.java" is being rebuilt.
    File "Token.java" is being rebuilt.
    File "SimpleCharStream.java" is being rebuilt.
    Parser generated successfully.
    Call:   file
      Call:   expression
        Call:   operator_expression
          Consumed token: <<code>: "A" at line 1 column 1>
          Consumed token: <<op>: "+" at line 1 column 2>
          Call:   expression
            Call:   operator_expression
              Consumed token: <<code>: "BC" at line 1 column 3>
              Consumed token: <<op>: "+" at line 1 column 5>
              Call:   expression
                Call:   negated_expression
                  Consumed token: <"-" at line 1 column 6>
                  Call:   parenthesis_expression
                    Consumed token: <"(" at line 1 column 7>
                    Call:   expression
                      Call:   operator_expression
                        Consumed token: <<code>: "2XXL" at line 1 column 8>
                        Consumed token: <<op>: "+" at line 1 column 12>
                        Call:   expression
                          Call:   operator_expression
                            Consumed token: <<code>: "A" at line 1 column 13>
                            Consumed token: <<op>: "/" at line 1 column 14>
                            Call:   expression
                              Consumed token: <<code>: "-B" at line 1 column 15>
                            Return: expression
                          Return: operator_expression
                        Return: expression
                      Return: operator_expression
                    Return: expression
                    Consumed token: <")" at line 1 column 17>
                  Return: parenthesis_expression
                Return: negated_expression
              Return: expression
            Return: operator_expression
          Return: expression
        Return: operator_expression
      Return: expression
    Return: file
    com.testing.ParseException: Encountered " ")" ") "" at line 1, column 18.
    Was expecting:
        <EOF> 
    
      at com.testing.CodeParser.generateParseException(CodeParser.java:354)
      at com.testing.CodeParser.jj_consume_token(CodeParser.java:238)
      at com.testing.CodeParser.file(CodeParser.java:34)
      at com.testing.CodeParser.main(CodeParser.java:22)
    

    Voila! An exception.