I'm building a kind of pre-processor in ANTLRv3, which of course only works with fuzzy parsing. At the moment I'm trying to parse include statements and replace them with the corresponding file content. I used this example:
Based on this example, I wrote the following code:
grammar preprocessor;
options {
language='Java';
}
@lexer::header {
package antlr_try_1;
}
@parser::header {
package antlr_try_1;
}
parse
: (t=. {System.out.print($t.text);})* EOF
;
INCLUDE_STAT
: 'include' (' ' | '\r' | '\t' | '\n')+ ('A'..'Z' | 'a'..'z' | '_' | '-' | '.')+
{
setText("Include statement found!");
}
;
Any
: . // fall through rule, matches any character
;
This grammar does only for printing the text and replacing the include statements with the "Include statement found!" string. The example text to be parsed looks like this:
some random input
some random input
some random input
include some_file.txt
some random input
some random input
some random input
The output of the result looks in the following way:
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 1:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 2:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 3:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 7:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 8:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 9:14 mismatched character 'p' expecting 'c'
some random ut
some random ut
some random ut
Include statement found!
some random ut
some random ut
some random ut
As far as I can judge, it is confused by the "in" in the word "input", because it "thinks" it would be the INCLUDE_STAT token.
Is there a better way to do it? The filter option I cannot use, since I need not only the include statements, but also the rest of the code. I've tried several other things, but couldn't find a proper solution.
You are observing one of ANTLR 3's limitations. You could use either of these options to correct the immediate problem:
Include the following syntactic predicate at the beginning of the INCLUDE_STAT
rule:
`('include' (' ' | '\r' | '\t' | '\n')+ ('A'..'Z' | 'a'..'z' | '_' | '-' | '.')+) =>`