Search code examples
antlrworksantlr4

How to exclude " and \ in ANTLR 4 string matching?


I have the following string that I want to match against the rule, stringLiteral:

"D:\\Downloads\\Java\\MyFile"

And my grammar is the file: String.g4, as follows:

grammar String;

fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;

stringLiteral
    :  '"' ( EscapeSequence | XXXXX  )* '"'
    ;
fragment
EscapeSequence
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UnicodeEscape
    |   OctalEscape
    ;

fragment
OctalEscape
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UnicodeEscape
    :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
    ;

What should I put in the XXXXX location in order to match any character that is not \ or "?

I tried the following, and it all doesn't work:

~['\\'"']
~['\\'\"']
~["\]
~[\"\\]
~('\"'|'\\')
~[\\\"]

I am using ANTLRWorks 2 to try this out. Errors are the following:

D:\Downloads\ANTLR\String.g4 line 26:5 mismatched character '<EOF>' expecting '"'
error(50): D:\Downloads\ANTLR\String.g4:26:5: syntax error: '<EOF>' came as a complete surprise to me while looking for rule element

Solution

  • Inside a character class, you only need to escape the backslash:

    The following is illegal, it escapes the ]:

    [\]
    

    The following matches a backslash:

    [\\]
    

    The following matches a quote:

    ["]
    

    And the following matches either a backslash or quote:

    [\\"]
    

    In v4 style, your grammar could look like this:

    grammar String;
    
    /* other rules */
    
    StringLiteral
        :  '"' ( EscapeSequence | ~[\\"]  )* '"'
        ;
    
    fragment
    HexDigit 
        : [0-9a-fA-F] 
        ;
    
    fragment
    EscapeSequence
        :   '\\' [btnfr"'\\]
        |   UnicodeEscape
        |   OctalEscape
        ;
    
    fragment
    OctalEscape
        :   '\\' [0-3] [0-7] [0-7]
        |   '\\' [0-7] [0-7]
        |   '\\' [0-7]
        ;
    
    fragment
    UnicodeEscape
        :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
        ;
    

    Note that you can't use fragments inside parser rules: StringLiteral must be a lexer rule!