For a study project, I am using the following ANTLR grammar to parse query strings containing some simple boolean operators like AND, NOT and others:
grammar SimpleBoolean;
options { language = CSharp2; output = AST; }
tokens { AndNode; }
@lexer::namespace { INR.Infrastructure.QueryParser }
@parser::namespace { INR.Infrastructure.QueryParser }
LPARENTHESIS : '(';
RPARENTHESIS : ')';
AND : 'AND';
OR : 'OR';
ANDNOT : 'ANDNOT';
NOT : 'NOT';
PROX : **?**
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9'|'ä'|'Ä'|'ü'|'Ü'|'ö'|'Ö');
fragment QUOTE : ('"');
fragment SPACE : (' '|'\n'|'\r'|'\t'|'\u000C');
WS : (SPACE) { $channel=Hidden; };
WORD : (~( ' ' | '\t' | '\r' | '\n' | '/' | '(' | ')' ))*;
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
startExpression : andExpression;
andExpression : (andnotExpression -> andnotExpression) (AND? a=andnotExpression -> ^(AndNode $andExpression $a))*;
andnotExpression : orExpression (ANDNOT^ orExpression)*;
proxExpression : **?**
orExpression : notExpression (OR^ notExpression)*;
notExpression : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;
Now I would like to add an operator for so-called proximity queries. For example, the query "A /5 B"
should return everything that contains A with B following within the next 5 words. The number 5 could be any other positive int of course. In other words, a proximity query should result in the following syntax tree:
http://graph.gafol.net/pic/ersaDEbBJ.png
Unfortunately, I don't know how to (syntactically) add such a "PROX" operator to my existing ANTLR grammar. Any help is appreciated. Thanks!
You could do that like this:
PROX : '/' '0'..'9'+;
...
startExpression : andExpression;
andExpression : (andnotExpression -> andnotExpression) (AND? a=andnotExpression -> ^(AndNode $andExpression $a))*;
andnotExpression : proxExpression (ANDNOT^ proxExpression)*;
proxExpression : orExpression (PROX^ orExpression)*;
orExpression : notExpression (OR^ notExpression)*;
notExpression : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;
If you parse the input:
A /500 B OR D NOT E AND F
the following AST is created: