Search code examples
escapingnestedantlrbnf

How to write the Tcl nested string rule in ANTLR?


A Tcl nested String can be something like this:

{abc {xyz foo {hello world}}}

The braces above are used to enclose the content of string, they are not part of the string(Similar to double quotes). And they can be escaped using "\{" and "\}" to change string "foo" to "foo{}":

{abc {xyz foo\{\} {hello world}}}

I have a working lexical rule for the one without the brace escaping:

NestedBraces
  :  '{' ( ~('{'|'}') | NestedBraces)* '}'
  ;

I am trying to find a way to add the escaping part while keeping the nested syntax, and haven't succeeded so far.


Solution

  • Try this:

    NestedBraces
     : '{' (~('{' | '}' | '\\') | '\\' ('{' | '}') | NestedBraces)* '}'
     ;
    

    In the inner loop, you match:

    ~('{' | '}' | '\\')   // anything other than a '{', '}' and '\'
    |                     // OR
    '\\' ('{' | '}')      // an escaped '{' or '}' 
    |                     // OR
    NestedBraces          // recursive call: '{' ... '}'
    

    And if you want to strip the un-escaped braces from the tokens in one go, do something like this:

    NestedBraces
     : Helper {setText($text.replaceAll("\\\\(.)|[{}]", "$1"));}
     ;
    
    fragment Helper
     : '{' (~('{' | '}' | '\\') | '\\' ('{' | '}') | Helper)* '}'
     ;
    

    which will create a token with inner text "abc xyz foo{ hello world" for the input "{abc {xyz foo\{ {hello world}}}". Note that you need a helper rule: you can't do the replacement and also the recursive call in one rule.