Search code examples
javacc

Define token to match any string


I am new to javacc. I am trying to define a token which can match any string. I am following the regex syntax <ANY: (~[])+> which is not working. I want to achieve something very simple, define an expression having the following BNF:

<exp> ::= "path(" <string> "," <number> ")"

My current .jj file is as follows, any help on how I can parse the string:

options
{
}
PARSER_BEGIN(SimpleAdd)
package SimpleAddTest;
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
    " "
|   "\r"
|   "\t"
|   "\n"
}
TOKEN:
{
    < NUMBER: (["0"-"9"])+  > |
    <PATH: "path"> |
    <RPAR: "("> |
    <LPAR: ")"> |
    <QUOTE: "'"> |
    <COMMA: ","> |
    <ANY: (~[])+>


}

int expr():
{
    String leftValue ;
    int rightValue ;
}
{

        <PATH> <RPAR> <QUOTE> leftValue = str() <QUOTE> <COMMA> rightValue = num() <LPAR>
    { return 0; }
}

String str():
{
    Token t;
}
{

    t = <ANY> { return t.toString(); }
}

int num():
{
    Token t;
}
{
    t = <NUMBER> { return Integer.parseInt(t.toString()); }
}

The error I am getting with the above javacc file is:

Exception in thread "main" SimpleAddTest.ParseException: Encountered " <ANY> "path(\'5\',1) "" at line 1, column 1.
Was expecting:
    "path" ...

Solution

  • The pattern <ANY: (~[])+> will indeed match any nonempty string. The issue is that this is not what you really want. If you have a rule <ANY: (~[])+>, it will match the whole file, unless the file is empty. In most cases, because of the longest match rule, the whole file will be parsed as [ANY, EOF]. Is that really what you want? Probably not.

    So I'm going to guess at what you really want. I'll guess you want any string that doesn't include a double quote character. Maybe there are other restrictions, such as no nonprinting characters. Maybe you want to allow double quotes if the are preceded by a backslash. Who knows? Adjust as needed.

    Here is what you can do. First, replace the token definitions with

    TOKEN:
    {
        < NUMBER: (["0"-"9"])+  > |
        <PATH: "path"> |
        <RPAR: "("> |
        <LPAR: ")"> |
        <COMMA: ","> |
        <STRING: "\"" (~["\""])* "\"" >
    }
    

    Then change your grammar to

    int expr():
    {
        String leftValue ;
        int rightValue ;
    }
    {    
            <PATH> <RPAR> leftValue=str() <COMMA> rightValue = num() <LPAR>
        { return 0; }
    }
    
    String str():
    {
        Token t;
        int len ;
    }
    {    
        t = <String>
        { len = t.image.length() ; }
        { return t.image.substring(1,len-1); }
    }