I'm starting exploring ANTLR and I'm trying to match this format: (test123 A0020 )
Where :
I tried this grammar :
IDENTIFIER
:
( LETTER | DIGIT ) +
;
INT
:
DIGIT+
;
fragment
DIGIT
:
[0-9]
;
fragment
LETTER
:
[A-Z]
;
WS : [ \t\r\n(\s)+]+ -> channel(HIDDEN) ;
formatter: '(' information ')';
information :
information '/' 'A' INT
|IDENTIFIER ;
How can I resolve the ambiguity and get the time format matched as 'A' INT not as IDENTIFIER? Also how can I add checks like length of token to the identifier? I tknow that this doesn't work in ANTLR : IDENTIFIER : (DIGIT | LETTER ) {2,10}
UPDATE:
I changed the rules to have semantic checks but I still have the same ambiguity between the identifier and the Time format. here's the modified rules:
formatter
: information
| information '-' time
;
time :
timeMode timeCode;
timeMode:
{ getCurrentToken().getText().matches("[A,C]")}? MOD
;
timeCode: {getCurrentToken().getText().matches("[0-9]{4}")}? INT;
information: {getCurrentToken().getText().length() <= 10 }? IDENTIFIER;
MOD: 'A' | 'C';
So the problem is illustrated in the production tree, A0023 is matched to timeMode and the parser is complaining that the timeCode is missing
Here is a way to handle it:
grammar Test;
@lexer::members {
private boolean isAhead(int maxAmountOfCharacters, String pattern) {
final Interval ahead = new Interval(this._tokenStartCharIndex, this._tokenStartCharIndex + maxAmountOfCharacters - 1);
return this._input.getText(ahead).matches(pattern);
}
}
parse
: formatter EOF
;
formatter
: information ( '-' time )?
;
time
: timeMode timeCode
;
timeMode
: TIME_MODE
;
timeCode
: {getCurrentToken().getType() == IDENTIFIER_OR_INTEGER && getCurrentToken().getText().matches("\\d{4}")}?
IDENTIFIER_OR_INTEGER
;
information
: {getCurrentToken().getType() == IDENTIFIER_OR_INTEGER && getCurrentToken().getText().matches("\\w*[a-zA-Z]\\w*")}?
IDENTIFIER_OR_INTEGER
;
IDENTIFIER_OR_INTEGER
: {!isAhead(6, "[AP]\\d{4}(\\D|$)")}? [a-zA-Z0-9]+
;
TIME_MODE
: [AP]
;
SPACES
: [ \t\r\n] -> skip
;
A small test class:
public class Main {
private static void indent(String lispTree) {
int indentation = -1;
for (final char c : lispTree.toCharArray()) {
if (c == '(') {
indentation++;
for (int i = 0; i < indentation; i++) {
System.out.print(i == 0 ? "\n " : " ");
}
}
else if (c == ')') {
indentation--;
}
System.out.print(c);
}
}
public static void main(String[] args) throws Exception {
TestLexer lexer = new TestLexer(new ANTLRInputStream("1P23 - A0023"));
TestParser parser = new TestParser(new CommonTokenStream(lexer));
indent(parser.parse().toStringTree(parser));
}
}
will print:
(parse
(formatter
(information 1P23) -
(time
(timeMode A)
(timeCode 0023))) <EOF>)
for the input "1P23 - A0023"
.
ANTLR also can output the parse tree on UI component. If you do this instead:
public class Main {
public static void main(String[] args) throws Exception {
TestLexer lexer = new TestLexer(new ANTLRInputStream("1P23 - A0023"));
TestParser parser = new TestParser(new CommonTokenStream(lexer));
new TreeViewer(Arrays.asList(TestParser.ruleNames), parser.parse()).open();
}
}
the following dialog will appear:
Tested with ANTLR version 4.5.2-1