Search code examples
escapingantlrquotesantlr4

How do I escape an escape character with ANTLR 4?


Many languages bound a string with some sort of quote, like this:

"Rob Malda is smart."

ANTLR 4 can match such a string with a lexer rule like this:

QuotedString : '"' .*? '"';

To use certain characters within the string, they must be escaped, perhaps like this:

"Rob \"Commander Taco\" Malda is smart."

ANTLR 4 can match this string as well;

EscapedString : '"' ('\\"|.)*? '"';

(taken from p96 of The Definitive ANTLR 4 Reference)

Here's my problem: Suppose that the character for escaping is the same character as the string delimiter. For example:

"Rob ""Commander Taco"" Malda is smart."

(This is perfectly legal in Powershell.)

What lexer rule would match this? I would think this would work:

EscapedString : '"' ('""'|.)*? '"';

But it doesn't. The lexer tokenizes the escape character " as the end of string delimiter.


Solution

  • Negate certain characters with the ~ operator:

    EscapedString : '"' ( '""' | ~["] )* '"';
    

    or, if there can't be line breaks in your string, do:

    EscapedString : '"' ( '""' | ~["\r\n] )* '"';
    

    You don't want to use the non-greedy operator, otherwise "" would never be consumed and "a""b" would be tokenized as "a" and "b" instead of a single token.