Search code examples
antlrgrammarantlr4lexerquoted-identifier

No way to implement a q quoted string with custom delimiters in Antlr4


I'm trying to implement a lexer rule for an oracle Q quoted string mechanism where we have something like q'$some string$'

Here you can have any character in place of $ other than whitespace, (, {, [, <, but the string must start and end with the same character. Some examples of accepted tokens would be: q'!some string!' q'ssome strings' Notice how s is the custom delimiter but it is fine to have that in the string as well because we would only end at s'

Here's how I was trying to implement the rule:

Q_QUOTED_LITERAL: Q_QUOTED_LITERAL_NON_TERMINATED . QUOTE-> type(QUOTED_LITERAL); 

Q_QUOTED_LITERAL_NON_TERMINATED:
    Q QUOTE ~[ ({[<'"\t\n\r] { setDelimChar( (char)_input.LA(-1) ); } 
    ( . { !isValidEndDelimChar() }? )* 
;

I have already checked the value I get from !isValidEndDelimChar() and I'm getting a false predicate here at the right place so everything should work, but antlr simply ignores this predicate. I've also tried moving the predicate around, putting that part in a separate rule, and a bunch of other stuff, after a day and a half of research on the same I'm finally raising this issue.

I have also tried to implement it in other ways but there doesn't seem to be a way to implement a custom char delimited string in antlr4 (The antlr3 version used to work).


Solution

  • Not sure why the { ... } action isn't invoked, but it's not needed. The following grammar worked for me (put the predicate in front of the .!):

    grammar Test;
    
    @lexer::members {
      boolean isValidEndDelimChar() {
        return (_input.LA(1) == getText().charAt(2)) && (_input.LA(2) == '\'');
      }
    }
    
    parse
     : .*? EOF
     ;
    
    Q_QUOTED_LITERAL
     : 'q\'' ~[ ({[<'"\t\n\r] ( {!isValidEndDelimChar()}? . )* . '\''
     ;
    
    SPACE
     : [ \t\f\r\n] -> skip
     ;
    

    If you run the class:

    import org.antlr.v4.runtime.*;
    
    public class Main {
    
      public static void main(String[] args) {
    
        Lexer lexer = new TestLexer(CharStreams.fromString("q'ssome strings' q'!foo!'"));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        tokens.fill();
    
        for (Token t : tokens.getTokens()) {
          System.out.printf("%-20s %s\n", TestLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
        }
      }
    }
    

    the following output will be printed:

    Q_QUOTED_LITERAL     q'ssome strings'
    Q_QUOTED_LITERAL     q'!foo!'
    EOF                  <EOF>