Search code examples
regexparsingantlrantlr4pcre

I can't trigger ANTLR rule


I'm trying to build a PCRE engine, and I'm using this ANTLR grammar. These are some of its rules:

octal_char
 : ( Backslash (D0 | D1 | D2 | D3) octal_digit octal_digit
   | Backslash octal_digit octal_digit
   )

 ;

octal_digit
 : D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7
 ;

digit
 : D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 // just '0','1','2','3',...,'9'
 ;

When I try triggering the octal_char rule with strings like \075, it simply doesn't work, and I don't understand why.

Example parse tree for the string \075:

parse
  alternation
    expr
      element
        atom
          shared_atom \0
      element
        atom
          literal
            shared_literal
              digit 7
      element
        atom
          literal
            shared_literal
              digit 5
  <EOF>

Solution

  • In the atom rule, move the backreference up. So instead of:

    atom
     : ...
     | backreference
     | ...
     ;
    

    do:

    atom
     : backreference
     | ...
     ;
    

    FYI: note that the grammar you're using is based on the document http://www.pcre.org/pcre.txt from 10 January 2012. The current revision is from 14 June 2021, so there are quite some changes not yet accounted for in that ANTLR grammar you're using.

    Edit

    I just updated the grammar and made a PR: https://github.com/antlr/grammars-v4/pull/3690 (original repo with better test cases can be found here: https://github.com/bkiers/pcre-parser)