I'm trying to build a PCRE engine, and I'm using this ANTLR grammar. These are some of its rules:
octal_char
: ( Backslash (D0 | D1 | D2 | D3) octal_digit octal_digit
| Backslash octal_digit octal_digit
)
;
octal_digit
: D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7
;
digit
: D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 // just '0','1','2','3',...,'9'
;
When I try triggering the octal_char
rule with strings like \075
, it simply doesn't work, and I don't understand why.
Example parse tree for the string \075
:
parse
alternation
expr
element
atom
shared_atom \0
element
atom
literal
shared_literal
digit 7
element
atom
literal
shared_literal
digit 5
<EOF>
In the atom
rule, move the backreference
up. So instead of:
atom
: ...
| backreference
| ...
;
do:
atom
: backreference
| ...
;
FYI: note that the grammar you're using is based on the document http://www.pcre.org/pcre.txt from 10 January 2012. The current revision is from 14 June 2021, so there are quite some changes not yet accounted for in that ANTLR grammar you're using.
I just updated the grammar and made a PR: https://github.com/antlr/grammars-v4/pull/3690 (original repo with better test cases can be found here: https://github.com/bkiers/pcre-parser)